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ABSTRACT 


This  report  presents  a  collection  of  computer-generated  statistical  distributions  which  are  useful  for  performing 
Monte  Carlo  simulations.  The  distributions  are  encapsulated  into  a  C++  class,  called  “Random,”  so  that  they  can  be 
used  with  any  C++  program.  The  class  currently  contains  27  continuous  distributions,  9  discrete  distributions, 
data-driven  distributions,  bivariate  distributions,  and  number-theoretic  distributions.  The  class  is  designed  to  be 
flexible  and  extensible,  and  this  is  supported  in  two  ways:  (1)  a  function  pointer  is  provided  so  that  the 
user-programmer  can  specify  an  arbitrary  probability  density  function,  and  (2)  new  distributions  can  be  easily  added 
by  coding  them  directly  into  the  class.  The  format  of  the  report  is  designed  to  provide  the  practitioner  of  Monte 
Carlo  simulations  with  a  handy  reference  for  generating  statistical  distributions.  However,  to  be  self-contained, 
various  techniques  for  generating  distributions  are  also  discussed,  as  well  as  procedures  for  estimating  distribution 
parameters  from  data.  Since  most  of  these  distributions  rely  upon  a  good  underlying  uniform  distribution  of  random 
numbers,  several  candidate  generators  are  presented  along  with  selection  criteria  and  test  results.  Indeed,  it  is  noted 
that  one  of  the  more  popular  generators  is  probably  overused  and  under  what  conditions  it  should  be  avoided. 
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1.  SUMMARY 

This  report  presents  a  collection  of  various  distributions  of  random  numbers,  suitable  for  performing  Monte  Carlo 
simulations.  They  have  been  organized  into  a  C++  class,  called  “Random,”  which  is  callable  from  any  C++ 
program.  Using  the  Random  class  is  very  simple.  For  example,  the  following  is  source  code  to  print  1,000  normal 
variates  with  a  mean  of  zero  and  a  variance  of  one. 


//  Sample  program  for  using  the  Random 
# include  <iostream.h> 

# include  "Random. h"  <= 

void  main(  void  ) 

{ 

Random  rv;  <= 


class 

include  the  definition  of  the  Random  class 

declare  a  random  variate 


} 


for  (  int  i  =  0;  i  <  1000;  i++  ) 
cout  «  rv . normal ( ) 

«  endl; 


reference  the  normal  distribution  (with  default  parameters ) 


There  are  various  aspects  that  the  programmer  will  wish  to  know  at  this  point,  such  as  how  the  random  number  seed 
is  set  and  how  to  compile  and  link  the  sample  program.  These  aspects  are  discussed  later  (see  Appendix  B).  The 
point  to  be  emphasized  here  is  that  the  Random  class  is  very  easy  and  straightforward  to  use.  The  class  itself  is  quite 
comprehensive,  currently  containing  27  continuous  distributions,  9  discrete  distributions,  distributions  based  on 
empirical  data,  and  bivariate  distributions,  as  well  as  distributions  based  on  number  theory.  Moreover,  it  allows  the 
user-programmer  to  specify  an  arbitrary  function  or  procedure  to  use  for  generating  distributions  that  are  not  already 
in  the  collection.  It  is  also  shown  that  it  is  very  easy  to  extend  the  collection  to  include  new  distributions. 


2.  INTRODUCTION 

This  report  deals  with  random  number  distributions,  the  foundation  for  performing  Monte  Carlo  simulations. 
Although  Lord  Kelvin  may  have  been  the  first  to  use  Monte  Carlo  methods  in  his  1901  study  of  the  Boltzmann 
equation  in  statistical  mechanics,  their  widespread  use  dates  back  to  the  development  of  the  atomic  bomb  in  1944. 
Monte  Carlo  methods  have  been  used  extensively  in  the  field  of  nuclear  physics  for  the  study  of  neutron  transport 
and  radiation  shielding.  They  remain  useful  whenever  the  underlying  physical  law  is  either  unknown  or  it  is  known 
but  one  cannot  obtain  enough  detailed  information  in  order  to  apply  it  directly  in  a  deterministic  manner.  In 
particular,  the  field  of  operations  research  has  a  long  history  of  employing  Monte  Carlo  simulations.  There  are 
several  reasons  for  using  simulations,  but  they  basically  fall  into  three  categories. 

•  To  Supplement  Theory 

While  the  underlying  process  or  physical  law  may  be  understood,  an  analytical  solution — or  even  a  solution  by 
numerical  methods— may  not  be  available.  In  addition,  even  in  the  cases  where  we  possess  a  deterministic 
solution,  we  may  be  unable  to  obtain  the  initial  conditions  or  other  information  necessary  to  apply  it. 

•  To  Supplement  Experiment 

Experiments  can  be  very  costly  or  we  may  be  unable  to  perform  the  measurements  required  for  a  particular 
mathematical  model. 

•  Computing  Power  has  Increased  while  Cost  has  Decreased 

In  1965,  when  writing  an  article  for  Electronics  magazine,  Gordon  Moore  formulated  what  has  since  been 
named  Moore’s  Law:  the  number  of  components  that  could  be  squeezed  onto  a  silicon  chip  would  double  every 
year.  Moore  updated  this  prediction  in  1975  from  doubling  every  year  to  doubling  every  two  years.  These 
observations  proved  remarkably  accurate;  the  processing  technology  of  1996,  for  example,  was  some  eight 
million  times  more  powerful  than  that  of  1966  [Helicon  Publishing  1999]. 

In  short,  computer  simulations  are  viable  alternatives  to  both  theory  and  experiment  and  we  have  every  reason  to 
believe  they  will  continue  to  be  so  in  the  future.  A  reliable  source  of  random  numbers,  and  a  means  of  transforming 
them  into  prescribed  distributions,  is  essential  for  the  success  of  the  simulation  approach.  This  report  describes 
various  ways  to  obtain  distributions,  how  to  estimate  the  distribution  parameters,  descriptions  of  the  distributions, 
choosing  a  good  uniform  random  number  generator,  and  some  illustrations  of  how  the  distributions  may  be  used. 
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3.  METHODS  FOR  GENERATING  RANDOM  NUMBER  DISTRIBUTIONS 

We  wish  to  generate  random  numbers,*  x ,  that  belong  to  some  domain,  x  e  [ xmm ,  xmax],  in  such  a  way  that  the  fre¬ 
quency  of  occurrence,  or  probability  density,  will  depend  upon  the  value  of  x  in  a  prescribed  functional  form  f(x). 
Here,  we  review  several  techniques  for  doing  this.  We  should  point  out  that  all  of  these  methods  presume  that  we 
have  a  supply  of  uniformly  distributed  random  numbers  in  the  half-closed  unit  inteval  [0, 1).  These  methods  are 
only  concerned  with  transforming  the  uniform  random  variate  on  the  unit  interval  into  another  functional  form.  The 
subject  of  how  to  generate  the  underlying  uniform  random  variates  is  discussed  in  Appendix  A. 

We  begin  with  the  inverse  transformation  technique,  as  it  is  probably  the  easiest  to  understand  and  is  also  the  method 
most  commonly  used.  A  word  on  notation:  f(x)  is  used  to  denote  the  probability  density  and  F(x)  is  used  to  denote 
the  cumulative  distribution  function  (see  the  Glossary  for  a  more  complete  discussion). 

3.1  Inverse  Transformation 

If  we  can  invert  the  cumulative  distribution  function  F(x),  then  it  is  a  simple  matter  to  generate  the  probability  den¬ 
sity  function  f(x).  The  algorithm  for  this  technique  is  as  follows. 

(1)  Generate  U  ~  t/(0, 1). 

(2)  Return  X  =  F~l(U). 

It  is  not  difficult  to  see  how  this  method  works,  with  the  aid  of  Figure  1. 


Figure  1.  Inverse  Transform  Method. 

We  take  uniformly  distributed  samples  along  the  y  axis  between  0  and  1.  We  see  that,  where  the  distribution  func¬ 
tion  F (x)  is  relatively  steep,  there  will  result  a  high  density  of  points  along  the  x  axis  (giving  a  larger  value  of  f(x)), 
and,  on  the  other  hand,  where  F(x)  has  a  relatively  shallow  slope,  there  will  result  in  a  corresponding  lower  density 
of  points  along  the  x  axis  (giving  a  smaller  value  of  /(*)).  More  formally,  if 

*  Of  course,  all  such  numbers  generated  according  to  precise  and  specific  algorithms  on  a  computer  are  not  truly  random  at  all  but  only 
exhibit  the  appearance  of  randomness  and  are  therefore  best  described  as  “pseudo- random.”  However,  throughout  this  report,  we  use  the 
term  “random  number”  as  merely  a  shorthand  to  signify  the  more  correct  term  of  “pseudo-random  number.” 
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x  =  F-\y), 


(1) 


where  F(x)  is  the  indefinite  integral  F(x)  =  J  f(t)dt  of  the  desired  density  function  f(x),  then  y  =  F(x)  and 

“CO 

£=/<*>.  ® 

ax 

This  technique  can  be  illustrated  with  the  Weibull  distribution.  In  this  case,  we  have  F(x)  =  1  —  e  .  So,  if 
U  ~  1/(0, 1)  and  U  =  F(X),  then  we  find*  X  =  b  [-ln(l  ~U)]Vc. 

The  inverse  transform  method  is  a  simple,  efficient  technique  for  obtaining  the  probability  density,  but  it  requires 
that  we  be  able  to  invert  the  distribution  function.  As  this  is  not  always  feasible,  we  need  to  consider  other  tech¬ 
niques  as  well. 


3.2  Composition 

This  technique  is  a  simple  extension  of  the  inverse  transformation  technique.  It  applies  to  a  situation  where  the 
probability  density  function  can  be  written  as  a  linear  combination  of  simpler  composition  functions  and  where  each 
of  the  composition  functions  has  an  indefinite  integral  that  is  invertible.1  Thus,  we  consider  cases  where  the  density 
function  f(x)  can  be  expressed  as 

f(x)  =  '£Pifi(x),  (3) 

1=1 


where 


±Pi= i 

/  =  i 

and  each  of  the  has  an  indefinite  integral,  F,(x)  with  a  known  inverse.  The  algorithm  is  as  follows. 


(1)  Select  index  i  with  probability  p,  . 

(2)  Independently  generate  U  ~  U(0, 1). 

(3)  Return  X  =  FJl(U). 

For  example,  consider  the  density  function  for  the  Laplace  distribution  (also  called  the  double  exponential  distribu¬ 
tion): 


This  can  also  be  written  as 


/(*)  =  ^exp|- 


\x 

~~r) 


f(x)  =  ^Mx)  +  ^h(x), 


(5) 

(6) 


where 


x  <  a 

x  >  a 


0 


and 


fiW  =  j 


1 

-exp 


x-a 


x  <  a 


x  >a 


(7) 


Now  each  of  these  has  an  indefinite  integral,  namely 


*  Since  1  -U  has  precisely  the  same  distribution  as  U,  in  practice,  we  use  X  =  b(-]nU)llc,  which  saves  a  subtraction  and  is  therefore 
slightly  more  efficient. 

t  The  composition  functions  /,  must  be  defined  on  disjoint  intervals,  so  that  if  //<*)  >  0,  then  fj(x)  =  0  for  all  a:  whenever  j  *  i.  That  is, 
there  is  no  overlap  between  the  composition  functions. 
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fi(*H 


exp 


x-a^ 

b  ) 


x  <  a 


x  >o 


and  F2(x)  = 


1  -  exp 


x-a 


x  <  a 


x>a 


that  is  invertible.  Since  px-  p2  =  1/2,  we  can  select  U\~U(  0,1)  and  set 


i  =  - 


1  if  U{  >  1/2 


2  if  Ui  <  1/2 

Independently,  we  select  U2  ~  1/(0, 1)  and  then,  using  the  inversion  technique  of  section  3.1, 


la  +  b\nU2  if  /  =  1 
|a-/?lnt/2  if /  =  2  * 


(8) 


(9) 


(10) 


3.3  Convolution 

If  X  and  Y  are  independent  random  variables  from  known  density  functions  fx(x)  and  fY(y ),  then  we  can  generate 
new  distributions  by  forming  various  algebraic  combinations  of  X  and  Y .  Here,  we  show  how  this  can  be  done  via 
summation,  multiplication,  and  division.  We  only  treat  the  case  when  the  distributions  are  independent — in  which 
case,  the  joint  probability  density  function  is  simply  f(x,  y)  =  fx(x)fY(y ).  First  consider  summation.  The  cumula¬ 
tive  distribution  is  given  by 

F x+y(u)  =  J  J  fix,  y )  dx  dy  (1 1) 

x+y<u 


f(x,y)dy 


dx  . 


— oo  \^  =  -oo 


J 


(12) 


The  density  is  obtained  by  differentiating  with  respect  to  u ,  and  this  gives 


us  the  convolution  formula  for  the  sum 


oo 

fx+yM  =  j  fix,  u  -  x)dx  , 

-oo 


(13) 


where  we  used  Leibniz’s  rule  (see  Glossary)  to  carry  out  the  differentiation  (first  on  x  and  then  on  y).  Notice  that,  if 
the  random  variables  are  nonnegative,  then  the  lower  limit  of  integration  can  be  replaced  with  zero,  since  fx(x)  =  0 
for  all  x  <  0,  and  the  upper  limit  can  be  replaced  with  k,  since  fY{u  -  x)  =  0  for  x  >  u. 

Let  us  apply  this  formula  to  the  sum  of  two  uniform  random  variables  on  [0, 1].  We  have 

oo 

fx+Y  («)  =  J  fix)fiu  -x)dx  .  (14) 

—oo 


Since  f(x)  =  1  when  0  <  x  <  1,  and  is  zero  otherwise,  we  have 

1  u  r 

fx+rM=  f  f(u-x)dx=  \f(t)dt  =  \U  ““  ,  (15) 

J  J  2-u  1 <  u  <2 

0  u-l  l 

and  we  recognize  this  as  a  triangular  distribution  (see  section  5.1.24).  As  another  example,  consider  the  sum  of  two 
independent  exponential  random  variables  with  location  a  =  0  and  scale  b.  The  density  function  for  the  sum  is 

Z  z 

fx+riz)  =  J  fxix)fy iz-x)dx  =  J  i  e~x/b  i  e^b  dx  =  ±  ze~7jh  .  (16) 

0  0 

Using  mathematical  induction,  it  is  straightforward  to  generalize  to  the  case  of  n  independent  exponential  random 
variates: 
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(17) 


^n— 1  g-xlb 

/*,+•-+*.(*)  =  M  =  gamma  (0,  b,  n), 

where  we  recognized  this  density  as  the  gamma  density  for  location  parameter  a  =  0,  scale  parameter  b,  and  shape 
parameter  c  =  n  (see  section  5.1.1 1). 

Thus,  the  convolution  technique  for  summation  applies  to  a  situation  where  the  probability  distribution  may  be  writ¬ 
ten  as  a  sum  of  other  random  variates,  each  of  which  can  be  generated  directly.  The  algorithm  is  as  follows. 

(1)  Generate  Xt  ~  F~l{U)  for  i  =  1, 2,  •  •  • ,  n. 

(2)  Set  X  =  Xj  +  X2  +  •  •  •  +  Xn. 

To  pursue  this  a  bit  further,  we  can  derive  a  result  that  will  be  useful  later.  Consider,  then,  the  Erlang  distribution;  it 
is  a  special  case  of  the  gamma  distribution  when  the  shape  parameter  c  is  an  integer.  From  the  aforementioned  dis¬ 
cussion,  we  see  that  this  is  the  sum  of  c  independent  exponential  random  variables  (see  section  5.1.8),  so  that 

X  =  -hlnX! - MnXc=-hln(X1---Xc).  (18) 

This  shows  that  if  we  have  c  HD  exponential  variates,  then  the  Erlang  distribution  can  be  generated  via 

X  =  -b\nflX,.  (19) 

i=i 

Random  variates  may  be  combined  in  ways  other  than  summation.  Consider  the  product  of  X  and  Y .  The  cumula¬ 
tive  distribution  is 

Fxr(u)  =  {  J  /(■*>  y)  dx  dy  (20> 

xy£u 


-l 


uJx 

J  f(x,y)dy 


dx, 


Once  again,  the  density  is  obtained  by  differentiating  with  respect  to  u: 

00  1 

/xr(«)=  J  f(x,u/x)-dx. 

-CO 


Let  us  apply  this  to  the  product  of  two  uniform  densities.  We  have 

CO  j 

/xr(«)  =  |  f(x)f(u/x)  -  dx  . 

-oo 


(21) 


(22) 


(23) 


On  the  unit  interval,  f(x)  is  zero  when  x  >  1  and  f(u/x)  is  zero  when  x  <u.  Therefore, 

i  . 

fxr(u)  =  j  -  dx  =  -ln«.  (24) 

U 

This  shows  that  the  log  distribution  can  be  generated  as  the  product  of  two  IID  uniform  variates  (see  section  5.1.13). 
Finally,  let’s  consider  the  ratio  of  two  variates: 

Fy/xM  =  J  J  f(x>  y)  dx  dy  (25) 

y/x£u 


Differentiating  this  to  get  the  density. 


OO  | 

r  ux  \ 

-J 

J  f(x,y)dy 

-oo 

U=-°o  J 

(26) 
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(27) 


oo 

/ra(«)  =  |  f(x,ux)\x\dx. 

-oo 

As  an  example,  let  us  apply  this  to  the  ratio  of  two  normal  variates  with  mean  0  and  variance  1.  We  have 

OO  °° 

fr/x(“)  =  J  /(*)/(«*)  \x\dx  =  —  J  <f*2/V"V/2  Ixl  dx  , 

-oo  -oo 

and  we  find  that 


(28) 


oo 

frM  =  ~  J  e~^2a  xdx=  — i-j-  .  (29) 

K  J  ^(1+W2) 

This  is  recognized  as  a  Cauchy  distribution  (see  section  5.1.3). 

3.4  Acceptance-Rejection 

Whereas  the  previous  techniques  are  direct  methods,  this  is  an  indirect  technique  for  generating  the  desired 
distribution.  It  is  a  more  general  method,  which  can  be  used  when  more  direct  methods  fail;  however,  it  is  generally 
not  as  efficient  as  direct  methods.  Its  basic  virtue  is  that  it  will  always  work — even  for  cases  where  there  is  no 
explicit  formula  for  the  density  function  (as  long  as  there  is  some  way  of  evaluating  the  density  at  any  point  in  its 
domain).  The  technique  is  best  understood  geometrically.  Consider  an  arbitrary  probability  density  function,  /(*), 
shown  in  Figure  2.  The  motivation  behind  this  method  is  the  simple  observation  that,  if  we  have  some  way  of 
generating  uniformly  distributed  points  in  two  dimensions  under  the  curve  of  f(x ),  then  the  frequency  of  occurrence 
of  the  x  values  will  have  the  desired  distribution. 


m 


Figure  2.  Probability  Density  Generated  From  Uniform  Areal  Density. 

A  simple  way  to  do  this  is  as  follows. 

(1)  Select X  ~U (*min, 

^max)* 

(2)  Independently  select Y  ~U (ymin, ymax). 

(3)  Accept  X  if  and  only  if  Y  <  f(X). 

This  illustrates  the  idea,  and  it  will  work,  but  it  is  inefficient  due  to  the  fact  that  there  may  be  many  points  that  are 
enclosed  by  the  bounding  rectangle  that  lie  above  the  function.  So  this  can  be  made  more  efficient  by  first  finding  a 
function  /  that  majorizes  f(x),  in  the  sense  that  fix)  >  f(x)  for  all  x  in  the  domain,  and,  at  the  same  time,  the  inte¬ 
gral  of  f  is  invertible.  Thus,  let 

x  -*max 

F(x)  =  J  f{x)  dx  and  define  Amax  -  J  fix)  dx  .  (30) 

■*min  -*min 

Then  the  more  efficient  algorithm  is  as  follows. 
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(1)  Select  A  ~U(0,  A^). 

(2)  Compute  X  =  /-1(A). 

(3)  Independently  select  Y  ~U(0,  f(X)). 

(4)  Accept  X  if  and  only  ifF  <  f(X). 

The  acceptance-rejection  technique  can  be  illustrated  with  the  following  example.  Let  f(x)  =  10, 296  x5(l  -  x)1 .  It 
would  be  very  difficult  to  use  the  inverse  transform  method  upon  this  function,  since  it  would  involve  finding  the 
roots  of  a  13th  degree  polynomial.  From  calculus,  we  find  that  f(x)  has  a  maximum  value  of  2.97187  at  x  =  5/12. 
Therefore,  the  function  /(*)  =  2.97187  majorizes  /(*).  So,  with  A,^  =  2.97187,  F (x)  =  2. 97187 x,  and 
3W  =  2. 97187,  the  algorithm  is  as  follows. 

(1)  Select  A  ~  17(0, 2. 97187). 

(2)  Compute  X  =  A/ 2. 97 1 87. 

(3)  Independently  select  Y  -1/(0, 2. 97187). 

(4)  Accept  X  if  and  only  ifF  <  f(X). 

3.5  Sampling  and  Data-Driven  Techniques 

One  very  simple  technique  for  generating  distributions  is  to  sample  from  a  given  set  of  data.  The  simplest  technique 
is  to  sample  with  replacement,  which  effectively  treats  the  data  points  as  independent.  The  generated  distribution  is 
a  synthetic  data  set  in  which  some  fraction  of  the  original  data  is  duplicated.  The  bootstrap  method  (Diaconis  and 
Efron  1983)  uses  this  technique  to  generate  bounds  on  statistical  measures  for  which  analytical  formulas  are  not 
known.  As  such,  it  can  be  considered  as  a  Monte  Carlo  simulation  (see  section  3.7)  We  can  also  sample  without 
replacement,  which  effectively  treats  the  data  as  dependent.  A  simple  way  of  doing  this  is  to  first  perform  a  random 
shuffle  of  the  data  and  then  to  return  the  data  in  sequential  order.  Both  of  these  sampling  techniques  are  discussed  in 
section  5.3.3. 

Sampling  empirical  data  works  well  as  far  as  it  goes.  It  is  simple  and  fast,  but  it  is  unable  to  go  beyond  the  data 
points  to  generate  new  points.  A  classic  example  that  illustrates  its  limitation  is  the  distribution  of  darts  thrown  at  a 
dart  board.  If  a  bull’s  eye  is  not  contained  in  the  data,  it  will  never  be  generated  with  sampling.  The  standard  way  to 
handift  this  is  to  first  fit  a  known  density  function  to  the  data  and  then  draw  samples  from  it.  The  question  arises  as 
to  whether  it  is  possible  to  make  use  of  the  data  directly  without  having  to  fit  a  distribution  beforehand,  and  yet 
return  new  values.  Fortunately,  there  is  a  technique  for  doing  this.  It  goes  by  the  name  of  “data-based  simulation” 
or,  the  namft  preferred  here, “stochastic  interpolation.”  This  is  a  more  sophisticated  technique  that  will  generate  new 
data  points,  which  have  the  same  statistical  properties  as  the  original  data  at  a  local  level,  but  without  having  to  pay 
the  price  of  fitting  a  distribution  beforehand.  The  underlying  theory  is  discussed  in  (Taylor  and  Thompson  1986; 
Thompson  1989;  Bodt  and  Taylor  1982)  and  is  presented  in  section  5.3.4. 

3.6  Techniques  Based  on  Number  Theory 

Number  theory  has  been  used  to  generate  random  bits  of  0  and  1  in  a  very  efficient  manner  and  also  to  produce 
quasi-random  sequences.  The  latter  are  sequences  of  points  that  take  on  the  appearance  of  randomness  while,  at  the 
same  time,  possessing  other  desirable  properties.  Two  techniques  are  included  in  this  report. 

1 .  Primitive  Polynomials  Modulo  Two 

These  are  useful  for  generating  random  bits  of  l’s  and  0’s  that  cycle  through  all  possible  combinations  (exclud¬ 
ing  all  zeros)  before  repeating.  This  is  discussed  in  section  5.5.1. 

2.  Prime  Number  Theory 

This  has  been  exploited  to  produce  sequences  of  quasi-random  numbers  that  are  self-avoiding.  This  is  dis¬ 
cussed  in  section  5.5.2. 

3.7  Monte  Carlo  Simulation 

Monte  Carlo  simulation  is  a  very  powerful  technique  that  can  be  used  when  the  underlying  probability  density  is 
unknown,  or  does  not  come  from  a  known  function,  but  we  have  a  model  or  method  that  can  be  used  to  simulate  the 
desired  distribution.  Unlike  the  other  techniques  discussed  so  far,  there  is  not  a  direct  implementation  of  this  method 
in  section  5,  due  to  its  generality.  Instead,  we  use  this  opportunity  to  illustrate  this  technique.  For  this  purpose,  we 
use  an  example  that  occurs  in  fragment  penetration  of  plate  targets. 
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Consider  a  cube  of  side  length  a,  material  density  p,  and  mass  m  =  pa3.  Its  geometry  is  such  that  one,  two,  or,  at 
most,  three  sides  will  be  visible  from  any  direction.  Imagine  the  cube  situated  at  the  origin  of  a  cartesian  coordinate 
system  with  its  face  surface  normals  oriented  along  each  of  the  coordinate  axes.  Then  the  presented  area  of  the  cube 
can  be  parametrized  by  the  polar  angle  0  and  the  azimuthal  angle  <p.  Defining  a  dimensionless  shape  factor  y  by 

Ap  =  y(m/p)312,  (31) 

where  Ap  is  the  presented  area,  we  find  that  the  dimensionless  shape  factor  is 

7(0,  <j>)  =  sin  0  cos  ^  +  sin  0  sin  </>  +  cos  0.  (32) 

It  is  sufficient  to  let  0  e  [0,  tt/2)  and  </>  e  [0,  ni 2)  in  order  for  y  to  take  on  all  possible  values.  Once  we  have  this 
parametrization,  it  is  a  simple  matter  to  directly  simulate  the  shape  factor  according  to  the  following  algorithm. 

(1)  Generate  (0,  <p)  ~  uniformSpherical  (0,  k/2,  0,  nil). 

(2)  Return  y  -  sin  0  cos  (/>  +  sin  0  sin  <j>  +  cos  0. 

Figure  3  shows  a  typical  simulation  of  the  probability  density  f(y). 
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Figure  3.  Shape  Factor  Probability  Density  of  a  Randomly  Oriented  Cube  via  Monte  Carlo  Simulation. 
Incidently,  we  find  that 

Output  =  ye  [1,V3), 

Mean  =  3/2,  and 

Variance  =  Afn  -  5/4. 
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3.8  Correlated  Bivariate  Distributions 

If  we  need  to  generate  bivariate  distributions  and  the  variates  are  independent,  then  we  simply  generate  the  distribu¬ 
tion  for  each  dimension  separately.  However,  there  may  be  known  correlations  between  the  variates.  Here,  we  show 
how  to  generate  correlated  bivariate  distributions. 

To  generate  correlated  random  variates  in  two  dimensions,  the  basic  idea  is  that  we  first  generate  independent  vari¬ 
ates  and  then  perform  a  rotation  of  the  coordinate  system  to  bring  about  the  desired  correlation,  as  shown  in  Figure 
4. 


/ 

\ 
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Figure  4.  Coordinate  Rotation  to  Induce  Correlations. 


The  transformation  between  die  two  coordinate  systems  is  given  by 

x'  =  xcos0  +  ysin0  and  /  =  - xsinP  +  ycostf .  (33) 

Setting  the  correlation  coefficient  p  =  cos  0  so  that 

x'  =  px +i/l  —  p1  y  (34) 

induces  the  desired  correlation.  To  check  this, 

corr(x,  x')  =  p  corr(x,  x)  +  Vl  -  p1  corr(x,  y)  =  p  (1)  +  V 1 -  P2  (°)  =  P»  @5) 

since  corr(*,  x)  =  1  and  corr(x,  y )  =  0. 

Here  are  some  special  cases: 

0  =  0  p  =  1  x'  =  x 

<  0  =  tc/2  p  =  0  x'  is  independent  of  x  .  (36) 

0  =  n  p  =  - 1  x'  =  -x 

Thus,  the  algorithm  for  generating  correlated  random  variables  (x,  x'),  with  correlation  coefficient  p,  is  as  follows. 

(1)  Independently  generate  X  and  Y  (from  the  same  distribution). 

(2)  Set  X'  =  p  X  +  V 1  -  P2  F. 

(3)  Return  the  correlated  pair  (X,  X'). 


3.9  Truncated  Distributions 

Consider  a  probability  density  function  /(x)  defined  on  some  interval  and  suppose  that  we  want  to  truncate  the  dis¬ 
tribution  to  the  subinterval  [a,  b].  This  can  be  accomplished  by  defining  a  truncated  density: 


f(x)  = 


fix) 


Fib)-F{a) 


0 


a<  x<b 
otherwise 


(37) 
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which  has  corresponding  truncated  distribution 


0 

F(x)-F(a) 


F(b)-F(a) 

1 


x  <  a 

a<  x<  b  . 
x  >  b 


An  algorithm  for  generating  random  variates  having  distribution  function  F  is  as  follows. 

(1)  Generate  U  ~U (0,1). 

(2)  Set  Y  =  F(a)  +  [F(b)  -  F(a)]  U. 

(3)  Return  X  =  F_1(y). 


(38) 


This  method  works  well  with  the  inverse-transform  method.  However,  if  an  explicit  formula  for  the  function  F  is 
not  available  for  forming  the  truncated  distribution  given  in  equation  (38),  or  if  we  do  not  have  an  explicit  formula 
for  F~l ,  then  a  less  efficient  but  nevertheless  correct  method  of  producing  the  truncated  distribution  is  the  following 
algorithm. 

(1)  Generate  a  candidate  X  from  the  distribution  F . 

(2)  If  a  <  X  <  b,  then  accept  X;  otherwise,  go  back  to  step  1 . 

This  algorithm  essentially  throws  away  variates  that  lie  outside  the  domain  of  interest. 


4.  PARAMETER  ESTIMATION 

The  distributions  presented  in  section  5  have  parameters  that  are  either  known  or  have  to  be  estimated  from  data.  In 
the  case  of  continuous  distributions,  these  may  include  the  location  parameter,  a;  the  scale  parameter,  b;  and/or  the 
shape  parameter,  c.  In  some  cases,  we  need  to  specify  the  range  of  the  random  variate,  xmin  and  xmax.  In  the  case  of 
the  discrete  distributions,  we  may  need  to  specify  the  probability  of  occurrence,  p,  and  the  number  of  trials,  n.  Here, 
we  show  how  these  parameters  may  be  estimated  from  data  and  present  two  techniques  for  doing  this. 

4.1  Linear  Regression  (Least-Squares  Estimate) 

Sometimes,  it  is  possible  to  linearize  the  cumulative  distribution  function  by  transformation  and  then  to  perform  a 
multiple  regression  to  determine  the  values  of  the  parameters.  It  can  best  be  explained  with  an  example.  Consider 
the  Weibull  distribution  with  location  a  =  0: 

F(x)  =  1  -exp[-(x/h)c] .  (39) 

We  first  sort  the  data  .v,  in  accending  order: 

Xi<x2<x3<--<xN  .  (40) 

The  corresponding  cumulative  probability  is  F(xt)  =  F,  =  i/N.  Rearranging  eq.  (39)  so  that  the  parameters  appear 
linearly,  we  have 


ln[-ln(l  —F()]  =  clnx,- -clnb  .  (41) 

This  shows  that  if  we  regress  the  left-hand  side  of  this  equation  against  the  logarithms  of  the  data,  then  we  should 
get  a  straight  line.*  The  least-squares  fit  will  give  the  parameter  c  as  the  slope  of  the  line  and  the  quantity  -c  In  b  as 
the  intercept,  from  which  we  easily  determine  b  and  c. 


4.2  Maximum  Likelihood  Estimation 

In  this  method,  we  assume  that  the  given  data  came  from  some  underlying  distribution  that  contains  a  parameter  j3 
whose  value  is  unknown.  The  probability  of  getting  the  observed  data  with  the  given  distribution  is  the  product  of 
the  individual  densities: 


_  L(P)  =  ffi{Xx)fp{X2)---f^(XN).  (42) 

We  should  note  that  linearizing  the  cumulative  distribution  will  also  transform  the  error  term.  Normally  distributed  errors  will  be  trans¬ 
formed  into  something  other  than  a  normal  distribution.  However,  the  error  distribution  is  rarely  known,  and  assuming  it  is  Gaussian  to 
begin  with  is  usually  no  more  than  an  act  of  faith.  See  the  chapter  “Modeling  of  Data”  in  Press  et  al.  (1992)  for  a  discussion  of  this  point. 
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The  value  of  p  that  maximizes  L(P)  is  the  best  estimate  in  the  sense  of  maximizing  the  probability.  In  practice,  it  is 
easier  to  deal  with  the  logarithm  of  the  likelihood  function  (which  has  the  same  location  as  the  likelihood  function 

itself). 

As  an  example,  consider  the  lognormal  distribution.  The  density  function  is 


fhz 


ox 


-exp 


(In  x-nf 
2a 2 


x>0 

otherwise 


(43) 


The  log-likelihood  function  is 

AT  W  _ 

In  L(fi,  o1)  =  lnll  /^O*,)  =  X 1® /*„*(•*/) 

i= i  i= i 


(44) 


and,  in  this  case. 


\nL(p,a2) 


;=1 


ln(V2^<T2x,)  + 


(In  x,  -  fif 
2a2 


This  is  a  maximuum  when  both 

din  L(ju,  a2)  „  J  3  In  L(n,o2) 

- - =  0  and  - r— » - =  vj 

9  fi  da 2 


(45) 


(46) 


and  we  find 

A  =  T72ln*i  311(1  <72  =  T7  20nx,-/z)2 .  (47) 

N  imi  /V  I=i 

Thus,  maximum  likelihood  parameter  estimation  leads  to  a  very  simple  procedure  in  this  case.  First,  take  the  loga¬ 
rithms  of  all  the  data  points.  Then,  p  is  the  sample  mean,  and  a1  is  the  sample  variance. 


5.  PROBABILITY  DISTRIBUTION  FUNCTIONS 

In  this  section,  we  present  the  random  number  distributions  in  a  form  intended  to  be  most  useful  to  die  actual  prac- 
tioner  of  Monte  Carlo  simulations.  The  distributions  are  divided  into  five  subsections  as  follows. 

Continuous  Distributions 

There  are  27  continuous  distributions.  For  the  most  part,  they  make  use  of  three  parameters:  a  location  parame¬ 
ter,  a;  a  scale  parameter,  b\  and  a  shape  parameter,  c.  There  are  a  few  exceptions  to  this  notation.  In  the  case  of 
the  normal  distribution,  for  instance,  it  is  customary  to  use  p  for  the  location  parameter  and  a  for  the  scale 
parameter.  In  the  case  of  the  beta  distribution,  there  are  two  shape  parameters  and  these  are  denoted  by  v  and  w. 
Also,  in  some  cases,  it  is  more  convenient  for  the  user  to  select  the  interval  via  xmin  and  xmax  than  the  location 
and  scale.  The  location  parameter  merely  shifts  the  position  of  the  distribution  on  the  x-axis  without  affecting 
the  shape,  and  the  scale  parameter  merely  compresses  or  expands  the  distribution,  also  without  affecting  the 
shape.  The  shape  parameter  may  have  a  small  effect  on  the  overall  appearance,  such  as  in  the  Weibull  distribu¬ 
tion,  or  it  may  have  a  profound  effect,  as  in  the  beta  distribution. 

Discrete  Distributions 

There  are  nine  discrete  distributions.  For  the  most  part,  they  make  use  of  the  probability  of  an  event,  p,  and  the 
number  of  trials,  n. 

Empirical  and  Data-Driven  Distributions 

There  are  four  empirical  distributions. 

Bivariate  Distributions 

There  are  five  bivariate  distributions. 

Distributions  Generated  from  Number  Theory 

There  are  two  number-theoretic  distributions. 
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5.1  Continuous  Distributions 

To  aid  in  selecting  an  appropriate  distribution,  we  have  summarized  some  characteristics  of  the  continuous  distribu¬ 
tions  in  Table  1.  The  subsections  that  follow  describe  each  distribution  in  more  detail. 


Table  1.  Properties  for  Selecting  the  Appropriate  Continuous  Distribution 


Distribution  Name 

Parameters 

Symmetric  About  the  Mode 

Arcsin 

•*min  and 

yes 

Beta 

*mim  *max.  and  shape  V  and  w 

only  when  v  and  w  are  equal 

Cauchy  (Lorentz) 

location  a  and  scale  b 

yes 

Chi-Square 

shape  v  (degrees  of  freedom) 

no 

Cosine 

*min  and  Xjnax 

yes 

Double  Log 

•*min  and 

yes 

Erlang 

scale  b  and  shape  c 

no 

Exponential 

location  a  and  scale  b 

no 

Extreme  Value 

location  a  and  scale  b 

no 

F  Ratio 

shape  v  and  w  (degrees  of  freedom) 

no 

Gamma 

location  a ,  scale  b ,  and  shape  c 

no 

Laplace  (Double  Exponential) 

location  a  and  scale  b 

yes 

Logarithmic 

•*rnin  and 

no 

Logistic 

location  a  and  scale  b 

yes 

Lognormal 

location  a ,  scale  //,  and  shape  a 

no 

Normal  (Gaussian) 

location  //  and  scale  a 

yes 

Parabolic 

■*min  and  -Tmax 

yes 

Pareto 

shape  c 

no 

Pearson’s  Type  5  (Inverted  Gamma) 

scale  b  and  shape  c 

no 

Pearson’s  Type  6 

scale  b  and  shape  v  and  w 

no 

Power 

shape  c 

no 

Rayleigh 

location  a  and  scale  b 

no 

Student’s  t 

shape  v  (degrees  of  freedom) 

yes 

Triangular 

*min>  *max>  and  shape  c 

only  when  c  =  (x^  +  x^/2 

Uniform 

■*min  and 

yes 

User-Specified 

^min>  ^max  and  ymin?  ymax 

depends  upon  the  function 

Weibull 

location  a,  scale  b ,  and  shape  c 

no 

12 


5.1.1  Arcsine 


Figure  5.  Arcsine  Density  Function. 


Figure  6.  Arcsine  Distribution  Function. 
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5.1.2  Beta 


Density  Function: 


Distribution  Function: 


/(*)  = 


jcm(1  - 
B(v,  w) 

0 


0<x<\ 

otherwise 


where  B(v,  w)  is  the  beta  function,  defined  by  B(v,  w)  s  J  tv  *(1  -  t)w  {dt 

o 

J  Bx(v,  w)/B(v,  w)  0<x<\ 

F(x)  =  < 

j  0  otherwise 

X 

where  B^(v,  w)  is  the  incomplete  beta  function,  defined  by  Bx(v,  w )  =  j 


Input: 

jtmin,  minimum  value  of  random  variable;  jcmax,  maximum  value  of  random  variat 
v  and  w,  positive  shape  parameters 

Output: 

x  ^  -*max) 

Mode: 

(v  -  l)/(v  +  w  -  2)  for  v  >  1  and  w  >  1  on  the  interval  [0, 1] 

Mean: 

v/(v  +  w)  on  the  interval  [0, 1] 

Variance: 

vw/[(v  +  w)2(l  +  v  +  w)]  on  the  interval  [0, 1] 

Algorithm: 

( 1 )  Generate  two  IID  gamma  variates,  Y x  ~  gamma  ( 1 ,  v)  and  Y2  ~  gamma  ( 1 ,  w) 

(2)  RetumX  =  |Jrmin  +  (^ax_JCmin)Wl+y2)  ifv~W 

[  -^max  (**max  *min)  ^2)  if  V  <  W 

Source  Code: 

double  beta(  double  v,  double  w,  double  xMin,  double  xMax  ) 

{ 

if  (  v  <  w  )  return  xMax  -  (  xMax  -  xMin  )  *  beta(  w,  v  ) ; 
double  yl  =  gamma (  0 . ,  1 . ,  v  ) ; 
double  y2  =  gamma (  0 . ,  1 . ,  w  ) ; 

return  xMin  +  (  xMax  -  xMin  )  *  yl  /  (  yl  +  y2  ) ; 

] 

Notes: 

(1)  X  ~  Beta(v,  w)  if  and  only  if  1  -  X  ~  Beta(w,  v). 

(2)  When  v  =  w  =  1/2,  this  reduces  to  the  arcsine  distribution. 

(3)  When  v  =  w  =  1,  this  reduces  to  the  uniform  distribution. 

Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  7  and  8, 
respectively. 


Figure  7.  Beta  Density  Functions. 


Figure  8.  Beta  Distribution  Functions. 
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5.1.3  Cauchy  (Lorentz) 


,  s  i 

(  X  -  a  A2 

-  -1 

Density  Function: 

f{x)=Yb 

[1+l  »  J 

-oo  <  X  <  oo 

i 

1 

^  1 

Distribution  Function: 

F<*>=2  + 

—  — oo  <  X  <  OO 

Input: 

a,  location  parameter; 

b,  scale  parameter  is  the  half-width  at  half-maximum 

Output: 

x  e  (  oo,  oo) 

Mode: 

a 

Median: 

a 

Mean: 

a 

Variance: 

Does  not  exist 

Regression  Equation: 

tan  [7ti.Fi  - 1/2)]  =  x-Jb  - 

alb 

Algorithm: 

(1)  Generate  U  ~  U(- 1/2, 1/2) 

(2)  Return  X  =  a  +  b  tan  (tcU) 

Source  Code: 

double  cauchy (  double 

af  double  b  ) 

{ 

assert<  b  >  0.  ); 

return  a  +  b  *  tan(  M_PI  *  uniform(  -0.5,  0.5  )  ); 

} 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  9  and  10, 
respectively. 


Figure  9.  Cauchy  Density  Functions.  Figure  10.  Cauchy  Distribution  Functions. 
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5.1.4  Chi-Square 


Density  Function: 


Distribution  Function: 


Shape  parameter  v  >  1  is  the  number  of  degrees  of  freedom 
x  e  (0,  oo) 
v  -  2  for  v  >  2 
v 
2v 

Return  X  ~  gamma  (0, 2,  v/2) 

double  chiSquare(  int  df  ) 

{ 

assert(  df  >=  1  ) ; 

return  gamma (  0 . ,  2 . ,  0.5  *  double (  df  )  ) ; 

Notes:  (1)  The  chi-square  distribution  with  v  degrees  of  freedom  is  equal  to  the  gamma 

distribution  with  a  scale  parameter  of  2  and  a  shape  parameter  of  v/2. 

(2)  Let  Xt  ~  N(0, 1)  be  IID  normal  variates  for  /  =  1,  •  •  • ,  v.  Then  X2  =  2  xf 

i=  1 

is  a  %  distribution  with  v  degrees  of  freedom. 


Input: 

Output: 

Mode: 

Mean: 
Variance: 
Algorithm: 
Source  Code: 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  11  and  12, 
respectively. 


Figure  11.  Chi-Square  Density  Functions.  Figure  12.  Chi-Square  Distribution  Functions. 
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5.1.5  Cosine 


Density  Function: 


Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 
Algorithm: 

Source  Code: 


•^rain  —  x  —  -*max 

otherwise 
X<  *min 

^  -^min  —  *  —  -*max 
x  ^  ^max 

*min,  minimum  value  of  random  variable;  *max,  maximum  value  of  random  variable; 
location  parameter  a  =  (xmin  +  xraax)/2;  scale  parameter  b  -  (xmax  -  xmm)/7r 

X  G  [xmjn,  Xmax) 

(■^min  +  Xm9X)/2 
(■*min  Xm2x)l2 
(•*min  ■*"  Xmax)/2 

{xmaK  -  Xmin)\jt2  -%)!A7t2 

sin-1  (2F,-  - 1)  =  Xj/b  -alb, 

where  the  xt  are  arranged  in  ascending  order,  F,  =  UN,  and  i  =  1 ,2,  -  -  - ,  Ar 

(1)  Generate  U  ~U(- 1, 1) 

(2)  Return  X  =  a  +  b  sin-1  U 

double  cosine (  double  xMin,  double  xMax  ) 

{ 

assert (  xMin  <  xMax  ) ; 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ;  //  location  parameter 

double  b  *  (  xMax  -  xMin  )  /  M_PI;  //  scale  parameter 

return  a  +  b  *  asin(  uniform(  -1.,  1.  )  ); 

} 


f(x)  = 


F(X): 


1  fx-a 
2bC0S{~~b~ 


0 

1  +  sin 
1 


x-a 


The  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  13  and  14,  respec¬ 
tively. 


Figure  13.  Cosine  Density  Function.  Figure  14.  Cosine  Distribution  Function. 


17 


5.1.6  Double  Log 


Density  Function: 


Distribution  Function: 


Input: 

Output: 

Mode: 

Median: 


/(*)  = 


ilQ 


(\x-a\ 


■^min  —  *  —  -^max 


otherwise 


F{x)  = 


1  f\x-a\ 
2~{  2b 


1  f\x~a\ 

2  +l  lb 


1  -  In 

1  -In 


\x-a\ 

b 

\x  —  a\ 


■*min  —  X  <  G 


a  <  X  <  X„ 


xmm,  minimum  value  of  random  variable;  *max,  maximum  value  of  random  variable; 
location  parameter  a  =  (*min  +  *max)/2 ;  scale  parameter  b  =  (*max  -  xmin)/2 

X  G  [*min>  «*max) 


a  (Note  that,  strictly  speaking,  f(a )  does  not  exist  since  lim  f(x)  =  oo.) 

x  a 


Mean: 

Variance: 

Algorithm: 


Source  Code: 


a 

(*max  ~  -*min)  /36 

Based  on  composition  and  convolution  formula  for  the  product  of  two  uniform  densities: 

(1)  Generate  two  IID  uniform  variates,  Ut  ~  U  (0, 1),  z  =  1, 2 

(2)  Generate  a  Bernoulli  variate,  U  ~  Bernoulli  (0. 5) 

(3)  If  U  =  1,  return  X  =  a  +  bU  iU2;  otherwise,  return  X  =  a-b  U{U2 

double  doubleLog(  double  xMin,  double  xMax  ) 

1 

assert (  xMin  <  xMax  ) ; 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ?  //  location  parameter 

double  b  =  0 . 5  *  (  xMax  -  xMin  ) ;  //  scale  parameter 

if  (  bernoulli(  0.5  )  )  return  a  +  b  *  uniform ( )  *  uniform ( ) ; 

else  return  a-b  *  uniform( )  *  uniform( ) ; 


The  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  15  and  16,  respec¬ 
tively. 


Figure  15.  Double  Log  Density  Function. 


Figure  16.  Double  Log  Distribution  Function. 
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5.1.7  Erlang 


Density  Function: 


Distribution  Function: 

Input 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 


Source  Code: 


Notes: 


(x/b)c~1e~x,b 


/(*)  = 


b(c-  1)! 
0 


x>0 

otherwise 


F(x)  = 


1  -e 


rx/b 


c~\ 


(x/by 


ifo  H 


x  >0 
otherwise 


Scale  parameter  b  >  0;  shape  parameter  c,  a  positive  integer 

x  e  [0,  oo) 

b(c- 1) 

be 

b2c 

This  algorithm  is  based  on  the  convolution  formula. 

(1)  Generate  c  IID  uniform  variates,  Ut  ~  U  (0, 1) 

(2)  Return X  =  -by£\nUi=-b\nYlUi 

i=i  i=i 


double  erlang (  double  b,  int  c  ) 

C 

assert(  b  >  0.  &&  c  >=  1  ) ; 


double  prod  =  1.0; 

for  (  int  i  =  0;  i  <  c;  i++  )  prod  *=  uniform(  0.,  1.  >? 
return  -b  *  log(  prod  ); 

1 

The  Erlang  random  variate  is  the  sum  of  c  exponentially-distributed  random  variates, 
each  with  mean  b.  It  reduces  to  the  exponential  distribution  when  c  =  1. 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  17  and  18, 
respectively. 


1- 

y  fix)  b  =  1 

i- 

F(x)  ^ - = 

0.75- 

\c=l 

0.75- 

c=l 

\ 

/  C=X 

0.5- 

\ 

0.5- 

/  /  /'C- 3 

\_c  =  2 

/  /  / 

0.25- 

0.25- 

// 

b  —  1 

0- 

LX 

0  — 

- - - , - - - — - 1 - - r - 1 - 1 - 

1  | I  I  [II  I  I  l  i  i  i  i 

0123456  0123456 

a:  * 

Figure  17.  Erlang  Density  Functions.  Figure  18.  Erlang  Distribution  Functions. 
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5.1.8  Exponential 


Density  Function: 


Distribution  Function: 


/(*)  = 


J_  -(x-d)!b 

b 

o 


F(x)  = 


1  _  e~<x-a)lb 
0 


x>  a 

otherwise 

x>a 

otherwise 


Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 

Maximum  Likelihood: 
Algorithm: 

Source  Code: 


Location  parameter  a ,  any  real  number;  scale  parameter  b  >  0 
x  e  [a,  oo) 
a 

a  +  b  ln2 
a  +  b 
b 2 

-ln(l  -Fi)  -  Xilb  -  alb , 

where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN ,  and  i  =  1 , 2,  •  •  • ,  AT 
=  X,  the  mean  value  of  the  random  variates 

(1)  Generate  U  -  U{ 0, 1) 

(2)  Return X  =  a-b\nU 

double  exponential (  double  ar  double  b  ) 

C 

assert(  b  >  0.  ); 

return  a  -  b  *  log<  uniform(  0.r  1.  )  ); 

} 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  19  and  20, 
respectively. 


0  1  2  3  0  1  2  3 


x 


Figure  19.  Exponential  Density  Functions. 


Figure  20.  Exponential  Distribution  Functions. 
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5.1.9  Extreme  Value 


Density  Function: 

Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 
Algorithm: 

Source  Code: 


Notes: 


f(x)  =  —  e^x  a)lb  exp[-e^a)/b]  ~oo  <  x  <  oo 

b 

F(x)  =  1  -  exp[-eix~a)/b]  -oo  <  x  <  oo 
Location  parameter  a ,  any  real  number;  scale  parameter  b  >  0 
x  e  (— oo,  oo) 
a 

a  +  b  lnln2 

a —  by  where  7  «  0. 57721  is  Euler’s  constant 
b27r2i6 

ln[-ln(l -Fi)\  =  xi/b-afb, 

where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN,  and  i  =  1 , 2,  •  •  • ,  N 

(1)  Generate U  ~U( 0,1) 

(2)  Return  X  ~a  +  b\n(-\nU) 

double  extremeValue (  double  a,  double  b  ) 

{ 

assert(  b  >  0.  ); 

return  a+b  *  log(  -log(  uniform(  0.r  1.  )  )  ); 

} 

This  is  the  distribution  of  the  smallest  extreme.  The  distribution  of  the  largest 
extreme  may  be  obtained  from  this  distribution  by  reversing  the  sign  of  X  relative  to 
the  location  parameter  a,  i.e.,  X  =>  -(X  -  a). 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  21  and  22, 
respectively. 


Figure  21.  Extreme  Value  Density  Functions. 


Figure  22.  Extreme  Value  Distribution  Functions. 
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5.1.10  F  Ratio 


Density  Function: 


Distribution  Function: 

Input 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


/«= 


r[(v  +  w)/2]  ( vlwyax(v-2) 12 

ro/2)r(w/2)  (i  +  xv/w)(v+"’V2  x  ~ 

0  otherwise 


oo 

where  T(z)  is  the  gamma  function,  defined  by  T(z)  =  i  tz~x  e~!  dt 

0 

No  closed  form,  in  general. 

Shape  parameters  v  and  w  are  positive  integers  (degrees  of  freedom) 

x  e  [0,  oo) 

w(v-2)r 

— - —  for  v  >  2 

v(w  +  2) 

w/(w  -  2)  for  w  >  2 
2w2(v  + w-2) 


v(w  -  2)2(w  -  4) 


for  w  >  4 


(1)  Generate  V  -  j2(v)  and  W  -  ^2(w) 

Wv 

(2)  Return  X  =  — — 

Iv/w 


double  £Ratio(  int  vr  int  w  ) 

C 

assert(  v  >=  1  &&  w  >=  1  ) ; 

return  (  chiSquare(  v  )  /  v  )  /  (  chiSquare(  w  )  /  w  )? 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  23  and 
24,  respectively. 


Figure  23.  F-Ratio  Density  Function. 


Figure  24.  F-Ratio  Distribution  Function. 
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5.1.11  Gamma 


Density  Function: 


Distribution  Function: 


/(*)  = 


—  b~c(x-a)c-le-{x-a),b 
T(c) 


0 


x  >  a 

otherwise 


where  T(z)  is  the  gamma  function,  defined  by  T(z)  =  J  tz  1  e  1  dt 

o 

If  n  is  an  integer,  F(n)  =  (n  -  1)! 

No  closed  form,  in  general.  However,  if  c  is  a  positive  integer,  then 


0  otherwise 


Input: 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 


Location  parameter  a;  scale  parameter  b>  0;  shape  parameter  c  >  0 
x  e  [a,  oo) 

ia  +  b(c  -  1)  if  c  >  1 
|  a  if  c  <  1 

a  +  bc 
b2c 

There  are  three  algorithms  (Law  and  Kelton  1991),  depending  upon  the  value  of  the 
shape  parameter  c. 

Case  1:  c  <  1 
Let  0=1  +  de. 

(1)  Generate  UY  ~  t/(0, 1 )  and  set  P  =  0UX . 

If  P  >  1,  go  to  step  3;  otherwise,  go  to  step  2. 

(2)  Set  Y  =  Pllc  and  generate  U2~  U (0, 1). 

If  Ui  <  e~Y ,  return  X  =  Y ;  otherwise,  go  back  to  step  1. 

(3)  Set  Y  =  -  In  [(0  -  P)lc ]  and  generate  U2  ~  U(0, 1). 

If  U2  <YC~\  return  X  =  Y;  otherwise,  go  back  to  step  1 . 

Case  2:  c  =  1 

Return  X  ~  exponential  (a,  b). 

Case  3:  c  >  1 

Let  a  =  \Nlc -  1,  0  =  c-\nA,q  =  c+  1/a,  6  =  4.5,  and  d  =  1  +ln0. 

(1)  Generate  two  IID  uniform  variates,  U\  ~  U(0, 1)  and  U2~U(0, 1). 

(2)  Set  V  =  a  In  [UY/{1  —  C/i)],  Y  =  cev ,  Z  =  U\U2,  and  W  =  0  +  qV  -  Y . 

(3)  If  W  +  d  -  OZ  >  0,  return  X  =  Y;  otherwise,  proceed  to  step  4. 

(4)  If  W  >  In  Z,  return  X  =  Y ;  otherwise,  go  back  to  step  1. 
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Source  Code: 


double  gamma (  double  a,  double  b,  double  c  ) 

[ 

assert(  b  >  0.  &&  c  >  0.  ); 

const  double  A  =  1.  /  sqrt<  2.  *  c  -  1.  ); 
const  double  B  =  c  -  log (  4 .  ) ; 

const  double  Q  =  c  +  1 .  /A; 

const  double  T  =  4.5; 

const  double  D  =  1 .  +  log (  T  ) ; 

const  double  C  =  1.  +  c  /  M_E ; 

if  (  c  <  1.  )  { 

while  (  true  )  { 

double  p  =  C  *  uniform(  0.,  1.  ); 
if  (  P  >  1.  )  { 

double  y  =  -log(  (  C  -  p  )  /  c  ); 

if  (  uniform(  0.,  1.  )  <=  pow<  y,  c  -  1.  )  )  return  a  +  b  *  y; 

} 

else  { 

double  y  =  pow(  p,  1.  /  c  ); 

if  (  uniform(  0.,  1.  )  <=  exp(  -y  )  )  return  a  +  b  *  y; 

} 

} 

} 

else  if  (  c  i.o  )  return  exponential (  af  b  ); 
else  { 

while  (  true  )  ( 

double  pi  =  uniform(  0.,  1.  ); 
double  p2  =  uniform(  0.,  1.  ); 
double  v  =  A  *  log(  pi  /  (  1.  -  pi  )  ); 
double  y  =  c  *  exp (  v  ) ; 
double  z  =  pi  *  pi  *  p2; 
double  w=B+Q*v-y? 

if  (w+D-T*z>=0.  | |  w  >=  log<  z  )  )  return  a  +  b  *  y; 

} 

} 

} 

Notes:  (1)  When  c  =  1,  the  gamma  distribution  becomes  the  exponential  distribution. 

(2)  When  c  is  an  integer,  the  gamma  distribution  becomes  the  erlang  distribution. 

(3)  When  c  =  v/2  and  b  =  2,  the  gamma  distribution  becomes  the  chi-square 
distribution  with  v  degrees  of  freedom. 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  25  and  26, 
respectively. 


01234567  01234567 


Figure  25.  Gamma  Density  Functions.  Figure  26.  Gamma  Distribution  Functions. 
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5.1,12  Laplace  (Double  Exponential) 


Density  Function: 


Distribution  Function: 


Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 


Algorithm: 


Source  Code: 


1  (  be -ah 

/(*)  =  2b  CXPl - b~)  ~°° <  *  <  °° 

—  e<-x~aVb  x  <  a 

F(*)  =  ]2  , 

1  _  1  x>a 

2 

Location  parameter  a,  any  real  number;  scale  parameter  b  >  0 
x  e  ( — oo,  oo) 


| In (2 Ft)  =  Xj/b  -alb  0  <  F,  <  1/2 

{  -  In  [2(1  -  Ft))  =  Xj/b  -  alb  1/2  <Ft<  1 

where  the  xt  are  arranged  in  ascending  order,  F,  =  i/N,  and  /  =  1,2,  •  •  • ,  N 

(1)  Generate  two  IID  random  variates,  U\  ~  t/(0, 1)  and  U2  ~f/(0, 1) 

\a  +  blnU2  if  Ux  >  1/2 


(2)  Return  X 


a-b\nU2  if  <  1/2 


double  laplace(  double  a,  double  b  ) 
f 

assert(  b  >  0.  ); 

//  composition  method 

if  <  bernoulli (  0.5  )  )  return  a  +  b  *  log(  uniform(  0.r  1.  )  ); 
else  return  a  -  b  *  log(  uniform(  0.f  1.  )  ); 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  27  and  28 
respectively. 


a= 0  I  1- 


=  1  0.25 

.  b  =  2 


Figure  27.  Laplace  Density  Functions. 


Figure  28.  Laplace  Distribution  Functions. 


5.1.13  Logarithmic 


Density  Function: 


Distribution  Function: 

Input: 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


/(*H 


x-a 

~b~ 


\ 


0 


X  *  <  r  <  r 

-^min  —  ^  —  ^max 

otherwise 


X  ^  -*min 
■^min  —  X  —  **max 
x  ^  *max 


;tmin,  minimum  value  of  random  variable;  *max,  maximum  value  of  random  variable; 
location  parameter  a  =  *min;  scale  parameter  =  jcmax  -  ;cmin 

e  -^max) 


•*min  ^  (-*max  -^min) 


I  (-^max  **min) 


Based  on  the  convolution  formula  for  the  product  of  two  uniform  densities, 

(1)  Generate  two  IID  uniform  variates,  U{  ~U( 0,1)  and  U2~U( 0,1) 

(2)  Return X  =  a  +  bUlU2 


double  logarithmic (  double  xMin,  double  xMax  ) 
{ 

assert (  xMin  <  xMax  ) ; 


double  a  =  xMin;  //  location  parameter 

double  b  =  xMax  -  xMin;  //  scale  parameter 
return  a  +  b  *  uniform(  0.,  1.  )  *  uniform(  0.,  1.  ); 


The  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  29  and  30,  respec¬ 
tively. 


JC 

Figure  29.  Logarithmic  Density  Function. 


x 


Figure  30.  Logarithmic  Distribution  Function. 
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5.1.14  Logistic 


Density  Function: 

j  e(x-a)lb 

n  ’  b[\  +  e^lbf 

Distribution  Function: 

F«=  1  +  eHx-aVb  C0<X<°0 

Input 

Location  parameter  a,  any  real  number;  scale  parameter  b  >  0 

Output: 

X  €  ( — OO,  oo) 

Mode: 

a 

Median: 

a 

Mean: 

a 

Variance: 

*r2 

—  b 2 

3 

Regression  Equation: 

-]n(FJl-l)  =  Xi/b-a/b, 

where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN,  and  i  = 

Algorithm: 

(1)  Generate  U  ~U( 0,1) 

(2)  Return  X  =  a- b In ( U~l  - 1) 

Source  Code: 

double  logistic (  double  a,  double  b  ) 

t 

assert (  b  >  0.  ); 

return  a  -  b  *  log(  1.  /  uniform(  0*f  1.  )  -  1.  ); 


} 

Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  31  and  32, 
respectively. 


-3-2-10123  -3-2-101 


Figure  31.  Logistic  Density  Functions.  Figure  32.  Logistic  Distribution  Functions. 
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5.1.15  Lognormal 


Density  Function: 

Distribution  Function: 

Input: 

Output: 

Mode: 


1 

V2 ~7t  a(x  -  a) 
0 


exp 


[ln(*-g)-/42 

2o2 


x  >  a 
otherwise 


x  >  a 
otherwise 


Location  parameter  a ,  any  real  number,  merely  shifts  the  origin;  shape  parameter 
a  >  0;  scale  parameter  jj.  is  any  real  number 


x  e  [a,  00) 


a  +  e^~° 


Median: 

Mean: 

Variance: 

Regression  Equation: 

Maximum  Likelihood: 
Algorithm: 

Source  Code: 

Notes: 


a  +  eM 
a  +  en+°2'2 

e2^°\e°2  - 1) 

erT1(2Fl-  -  1)  =  -L  In  (Xi-a)-^~, 

y2 a  V2 a 

where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN,  and  i  =  1, 2,  •  •  • ,  N 
H  ^  jc,  and  <r2  =  -f  XOn*,-  -  juf 

A  i= 1  A  i=[ 

(1)  Generate  V  ~  N{fi ,  <y2) 

(2)  Return  X  =  a  +  ev 

double  lognormal (  double  a,  double  mu,  double  sigma  ) 

C 

return  a  +  exp(  normal (  mu,  sigma  )  ); 

} 

X  ~  LV(//,  o2)  if  and  only  if  lnX  -  N(ju ,  a2).  Note  that  fx  and  o2  are  not  the  mean 
and  variance  of  the  LN(ju,  a2)  distribution.  To  generate  a  lognormal  random  variate 

with  given  fi  and  a2,  set  //  =  In  (fi2!^  fi2  +  a2)  and  a2  =  In  [(//2  +  &2)lfi2]. 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  33  and  34, 
respectively. 


Figure  33.  Lognormal  Density  Functions.  Figure  34.  Lognormal  Distribution  Functions. 
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5.1.16  Normal  (Gaussian) 


Density  Function: 


m= 


“nr—  eXp 

v2a  a 


~(x  —  /Q2 
2o2 


Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 


Maximum  Likelihood: 
Algorithm: 


Location  parameter  ju,  any  real  number;  scale  parameter  a  >0 
x  e  (-00, 00) 

V 

M 

M 

a2 

erT1  (2 Ft  -  1 )  =  *,/V2<7  -  /W2a, 

where  the  Xi  are  arranged  in  ascending  order,  Ft  =  UN ,  and  i  =  1, 2,  •  •  • ,  N 

M  =  ^  T,  Xi  aado2  =  'Zixi-  m)2 

N  i==  i  iV 

(1)  Independently  generate  Ui  ~U(- 1,1)  and  U2~U (-1, 1) 

(2)  Set  U  =  U\  +  U\  (note  that  the  square  root  is  not  necessary  here) 

(3)  If  U  <  1 ,  return  X  =  //  +  aUx  V— 2  In  C//C/ ;  otherwise,  go  back  to  step  1 


Source  Code: 


Notes: 


double  normal (  double  mu,  double  sigma  ) 

i 

assert (  sigma  >0.  ); 
double  p,  pi,  p2 ; 
do  { 

pi  =  uniform(  -1.,  1.  ); 
p2  =  uniform(  -1.,  1.  )? 
p  =  pi  *  pi  +  p2  *  p2; 

}  while  (  p  >=  1 .  ) ; 

return  mu  +  sigma  *  pi  *  sqrt(  -2.  *  log(  p  )  /  P  )? 

} 

If  X  ~  N(fi,  a2),  then  exp(X)  ~  LN(p,  a2),  the  lognormal  distribution. 


The  probability  density  function  and  cumulative  distribution  function  for  the  standard  normal  are  shown  in  Figures 
35  and  36,  respectively. 


Figure  35.  Normal  Density  Function. 


Figure  36.  Normal  Distribution  Function. 
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5.1.17  Parabolic 

Density  Function: 

Distribution  Function: 
Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


Notes: 


f(x)  = 


4b  L 


i  -(- 


X- 


b 


-f 


'-min  —  X  2  -Vmax 


17/  ( a  +  2b-x)(x-a  +  b f 

^3  -''min  ~  X  —  Xmax 

xm[n,  minimum  value  of  random  variable;  Jtmax,  maximum  value  of  random  variable; 
location  parameter  a  =  ( xmm  +  *max)/2;  scale  parameter  b  =  (;cmax  -  xmin)!2 

x  ^  [-'■min’  -'"max) 

(^■min  "F  •'■max  V ■ 2 
(■*min  XmdtX)l2 

(^min  Xtaxjy2 

(*max  ~  *min)  0 

Uses  the  acceptance-rejection  method  on  the  above  density  function  /(jc) 

//  function  to  call  for  the  Monte  Carlo  sampling 
double  parabola (  double  x,  double  xMin,  double  xMax  ) 

C 

if  (  x  <  xMin  ||  x  >  xMax  )  return  0.0; 


double  a  =  0 . 5  *  (  xMin  +  xMax  ) ; 
double  b  =  0 . 5  *  (  xMax  -  xMin  ) ; 
double  yMax  =  3 .  /  (  4 .  *  b  ) ; 


//  location  parameter 
//  scale  parameter 


return  yMax  *<  1.  -  (  x  -  a  )  *  (x-a  )  /  (  b  *  b  )  ); 


//  function  which  generates  parabolic  distribution 
double  parabolic (  double  xMin,  double  xMax  ) 

C 

assert (  xMin  <  xMax  ) ; 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ;  //  location  parameter 

double  yMax  =  parabola (  a,  xMin,  xMax  );  //  max  function  range 

return  user Specif ied(  parabola,  xMin,  xMax,  0.,  yMax  ); 

1 

The  parabolic  distribution  is  a  special  case  of  the  beta  distribution  (when  v  =  w  =  2). 


The  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  37  and  38,  respec¬ 
tively. 


Figure  37.  Parabolic  Density  Function. 


Figure  38.  Parabolic  Distribution  Function. 
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5.1.18  Pareto 


Density  Function: 

Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 

Maximum  Likelihood: 
Algorithm: 

Source  Code: 


f  cx~c~l 

X>1 

/w={o 

otherwise 

f  \-x~c 

*>  1 

FW={o 

otherwise 

Shape  parameter  c  >  0 
X  €  [l,oo) 


1 

21/c 

c/{c  - 1)  for  c  >  1 

[c/(c  -  2)]  -  [c/(c  - 1)]2  for  c  >  2 

-ln(l  -Fj)  =  clnx, 

where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN,  and  i  =  1, 2,  •  •  • ,  N 

K*iH" 

(1)  Generate  U  ~  U(0, 1) 

(2)  Return  X  =  U~l/c 

double  pareto (  double  c  ) 

C 

assert(  c  >  0.  ); 

return  pow(  uniform{  0.,  1.  ),  -1.  /c  ); 

} 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  39  and  40, 
respectively. 


Figure  39.  Pareto  Density  Functions. 


Figure  40.  Pareto  Distribution  Functions. 
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5.1.19  Pearson’s  Type  5  (Inverted  Gamma) 


Density  Function: 


Distribution  Function: 


f(x)  = 


xHc+De-blx 


if  x  >  0 


ir«T(c) 

0  otherwise 


oo 

where  r(z)  is  the  gamma  function,  defined  by  T(z)  =  i  tz~le~‘  dt 


F(x)  = 


r (C,b/x)  .e  ^  n 
.  if  x  >  0 

r(c) 

0  otherwise 


OO 

where  F(a ,  z)  is  the  incomplete  gamma  function,  defined  by  T(a,  z)  =  |  ta~xe~l  dt 


Input: 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 


Scale  parameter  b  >  0;  shape  parameter  c>  0 
x  e  [0,  oo) 
bl(c+  1) 

bl(c  -  1)  for  c  >  1 

b2/[(c  —  l)2(c  -  2))  for  c  >  2 

(1)  Generate  Y  ~  gamma  (0,  l/b ,  c) 

(2)  Return  X  =  MY 


Source  Code:  double  pearson5  (  double  b,  double  c  ) 

{ 

assert{  b  >  0.  &&  c  >  0.  ); 
return  1 .  /  gamma (  0.,  1.  /b  ,  c); 

} 


Notes:  X  -  PearsonType5(&,  c)  if  and  only  if  MX  -  gamma  (0,  Mb,c).  Thus,  the  Pearson 

Type  5  distribution  is  sometimes  called  the  inverted  gamma  distribution . 

Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  41  and  42, 
respectively. 


Figure  41.  Pearson  type  5  Density  Functions. 


Figure  42.  Pearson  type  5  Distribution  Functions. 


32 


5.1.20  Pearson’s  Type  6 


Density  Function: 


Distribution  Function: 


Input: 

Output: 

Mode: 


Mean: 

Variance: 

Algorithm: 

Source  Code 


Notes 


/(*)  = 


(x/by-' 

bB(v,  w)[l  +  (x/b)]v+w 
0 


if  jc  >  0 
otherwise 


l 

where  B(v,  w)  is  the  Beta  function,  defined  by  B(v,  vv)  =  J  tv  *(1  -  t)w  ldt 

o 


if  jc>  0 


0  otherwise 

where  FB(x)  is  the  distribution  function  of  a  B(v,  w)  random  variable 
Scale  parameter  b  >  0;  shape  parameters  v  >  0  and  w  >  0 


x  e  [0,  oo) 

[fc(v-D 
(w  +  1) 

if  v  >  1 

0 

otherwise 

bv 

.  for  w  >  1 

w  - 1 
fc2v(v  +  w-  1) 


for  w  >  2 


(vv-  l)2(vv-2) 

(1)  Generate  Y  ~  gamma(0,  v,b)  and  Z  -  gamma(0,  w,£) 

(2)  Return  X  =  YIZ 


double  pearson6(  double  b,  double  v,  double  w  ) 

C 

assert (  b  >  0.  &&  v  >  0.  &&  w  >  0.  ); 
return  gamma (  0.,  b,  v)  /  gamma (  0.,  br  w  ); 

} 

X  ~  PearsonType6(l,  v,  vv)  if  and  only  if  X/(l  +  X)  ~  B(v,  w). 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  43  and  44, 
respectively. 


Figure  43,  Pearson  Type  6  Density  Functions. 


Figure  44.  Pearson  Type  6  Distribution  Functions. 
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5.1.21  Power 

Density  Function: 
Distribution  Function: 
Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 

Maximum  Likelihood: 
Algorithm: 

Source  Code: 


Notes: 


fix)  =  cxc~'  0  <  x  <  1 
F(x)  =  xc  0  <  *  <  1 
Shape  parameter  c  >  0 
*  e  [0, 1) 

jo  if  c  <  1 
1 1  if  c  >  1 

2~llc 


(c+1) 


(c+l)2(c  +  2) 
In  Fj  =  clnjq, 


where  the  xt  are  arranged  in  ascending  order,  Ft  =  UN,  and  i  =  1,2, 


c  —  — 


1  N 

N^laX‘ 

Kn  i= l 


-1 


(1)  Generate  U  ~  f/(0, 1) 

(2)  Return  X  =  Ul,c 


double  power (  double  c  ) 

c 

assert(  c  >  0.  ); 

return  pow(  uniform(  0.f  1.  ),  1.  /c  ); 


This  reduces  to  the  uniform  distribution  when  c  =  1. 


N 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  45  and  46, 
respectively. 


0  0.25  0.5  0.75  1  0  0.25  0.5  0.75  1 


x 


x 


Figure  45.  Power  Density  Functions. 


Figure  46.  Power  Distribution  Functions. 
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5.1.22  Rayleigh 


Density  Function: 


/(*)  = 


2  (x-a 
b 


exp 

0 


x-a 


\2 


x>a 

otherwise 


1  -  exp 


x-a 


x>a 


Distribution  Function: 

F(x)  =  - 

L  ;j 

0  otherwise 

Input: 

Location  a ,  any  real  number;  scale  b  >  0 

Output: 

a;  e  [a,  oo) 

Mode: 

a  +  b/^Jl 

Median: 

a  +  Win  2 

Mean: 

a  +  b4n!2 

Variance: 

b\\-n!4 ) 

Regression  Equation: 

V-  In  (1  -  Fj)  =  x-Jb  -  alb, 

where  the  xi  are  arranged  in  ascending  order,  F,  =  UN,  and  i  =  1, 2,  •  •  • ,  N 

Maximum  Likelihood: 

Sr 

II 

N  o')112 
i=l  ) 

Algorithm: 

(1)  Generate  U  ~U( 0,1) 

(2)  Return  X  =  a  +  &V-ln  U 

Source  Code: 

double  rayleigh(  double  ar  double  b  ) 

{ 

assert (  b  >  0.  ); 

return  a  +  b  *  sqrt(  -log(  unifona(  0.f  1.  )  )  ); 

} 

Notes: 

Rayleigh  is  a  special  case  of  the  Weibull  when  the  shape  parameter  c  =  2. 

Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  47  and 
48,  respectively. 


Figure  47.  Rayleigh  Density  Function. 


Figure  48.  Rayleigh  Distribution  Function. 


35 


5.1.23  Student’s  t 


Density  Function: 


Distribution  Function: 
Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


Notes: 


•2  N 


„  n(v+i)/2]  ft  ,  *2 

f(x)  V^r(v/2)  (  - 


-(v+l)/2 


— oo  <  X  <  oo 


where  F(z)  is  the  gamma  function,  defined  by  F(z)  = 


jV  le  1  dt 

o 


No  closed  form,  in  general 

Shape  parameter  v,  a  positive  integer  (number  of  degrees  of  freedom) 
x  e  (— oo,  oo) 


0 

0 

0 


v/(v  -  2)  for  v  >  2 

(1)  Generate  Y  ~  N( 0, 1)  and  Z  -  j2(v) 

(2)  Return  X  =  Y/-JzJv 

double  studentT(  int  df  ) 

{ 

assert (  df  >=  1  ) ? 

return  normal (  0.,  1.  )  /  sqrt(  chiSquare(  df  )  /  df  ); 

} 

For  v  >  30,  this  distribution  can  be  approximated  with  the  unit  normal  distribution. 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  49  and 
50,  respectively. 


Figure  49.  Student’s  t  Density  Functions.  Figure  50.  Student’s  t  Distribution  Functions. 


5.1.24  Triangular 


Density  Function: 


Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Algorithm: 


Source  Code: 


/(*)  = 


F(x)  = 


Xmax  Xrr 


C  Xm 


-x 


-X, 


•^max  C 


min  ^max 


(■^max  *min)(^  ^min) 


•*min  —  X  <  C 


C  —  —  -^max 

•^min  —  X  ^  C 


1-- 


(*max  x) 


C<X<X„ 


(**max  -^minX^max 

jcmin,  minimum  value  of  random  variable;  *max,  maximum  value  of  random  variable; 
c,  location  of  mode 

X  €  timin’  -*max) 


I  *min  +  V (*max  ”  ^min)(c  “  xmm)^  if  C  >  (*min  +  Xmax)/2 
|  *max  ““  V (-*max  “  -^minX^max  —  C)I2  if  C  <  (-*min  +  ^maxX^ 

(-*min  *max  +C)H 

[3(*max  *min)^  +  (^min  -*max  “  2c)  ]  /  72 
(1)  Generate  U  ~  U(0, 1) 

f  *mm  +  V(* max  ~  *min)(c  ~  ^  U  <  (c  -  *min)/(jtraax  -  *min) 

{  *max  -  V(^max  ~  *minX*max  _  c)0  ~U)  H  U  >{c-  *min )/(*max  -  *min) 
double  triangular (  double  xMin,  double  xMax,  double  c  ) 
assert(  xMin  <  xMax  &&  xMin  <=  c  &&  c  <=  xMax  ); 


double  p  =  uniform(  0.,  1.  ),  q=l.  -  p; 

if  (  p  <=  (  c  -  xMin  )  /  (  xMax  -  xMin  )  ) 

return  xMin  +  sqrt(  (  xMax  -  xMin  )  *  (  c  -  xMin  )  *  p  ) ; 
else 

return  xMax  -  sqrt(  (  xMax  -  xMin  )  *  (  xMax  -  c  )  *  q  ); 

} 

Examples  of  probability  density  functions  and  cumulative  distribution  funtions  are  shown  in  Figures  51  and  52, 
respectively. 


Figure  51.  Triangular  Density  Functions. 


Figure  52.  Triangular  Distribution  Functions. 
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5.1.25  Uniform 


Density  Function: 


/«  = 


<  x  <  Xr. 


0  otherwise 


0  X  <  -Emin 

Distribution  Function: 

F(x)  =  < 

X  ~  -^min  ^  , 

*^min  ^  X  <.  -£max 

*max  ~  -*min 

1  **max  ^  X 

Input: 

xmm,  minimum  value  of  random  variable;  *max,  maximum  value  of  random  variable 

Output: 

*  e  t^mim  -^max) 

Mode: 

Does  not  uniquely  exist 

Median: 

(•*min  +  -*maxV2 

Mean: 

(^min  -^max)^ 

Variance: 

(•*max 

^in)2/12 

Algorithm: 

(1)  Generate  U  —  £/  (0, 1 ) 

(2)  Return  X  —  ^min  (-^max  “  -^min)  U 

Source  Code: 

double  uni form (  double  xMin,  double  xMax  ) 

C 

assert {  xMin  <  xMax  ) ; 

return  xMin  +  (  xMax  -  xMin  )  *  _u ( ) ; 

} 


Notes:  (1)  The  source  code  for  _u  ( )  referenced  above  is  given  in  section  6. 

(2)  The  uniform  distribution  is  the  basis  for  most  distributions  in  the  Random  class. 

(3)  The  uniform  distribution  is  a  special  case  of  the  beta  distribution 
(when  v  =  w  =  1). 


The  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  53  and  54,  respec¬ 
tively. 


Figure  53.  Uniform  Density  Function. 


Figure  54.  Uniform  Distribution  Function. 


38 


5.1.26  User-Specified 

Density  Function: 

Input: 

Output: 

Algorithm: 


Source  Code: 


Notes: 


User-specified,  nonnegative  function  f(x) 
f(x),  nonnegative  function; 

xmjn  and  xmax,  minimum  and  maximum  value  of  domain; 
ymin  and  ymax,  minimum  and  maximum  value  of  function 

X  €  [Amin’ 

(1)  Generate  A~U( 0,  Amax) and Y  ~U (ymjn,  ^max)* 

where  Amax  ■  (xmax  -  xm  J(ymzx  -  ymin)  is  the  area  of  the  rectangle  that  encloses 

the  function  over  its  specified  doman  and  range 

(2)  Return  X  =  xmin  +  AJ{ymax  -  ymin)  if  f(X)  <  Y;  otherwise,  go  back  to  step  1 

double  user Specif ied(  double(  *usf  )(  double,  //  function 

double,  //  xMin 

double  ),  //  xMax 

double  xMin,  double  xMax,  //  domain 

double  yMin,  double  yMax  )  //  range 

*  assert (  xMin  <  xMax  &&  yMin  <  yMax  ); 

double  x,  y,  areaMax  -  (  xMax  -  xMin  )  *  (  yMax  -  yMin  ); 

//  acceptance-rejection  method 
do  { 

x  a  uniform (  0.0,  areaMax  )  /  (  yMax  -  yMin  )  +  xMin; 
y  =  uniform(  yMin,  yMax  ) ; 

}  while  (  y  >  usf(  x,  xMin,  xMax  )  )? 
return  x; 

} 

In  order  to  qualify  as  a  true  probaility  density  function,  the  integral  of  f(x)  over  its 

domain  must  equal  1,  but  that  is  not  a  requirement  here.  As  long  as  f(x)  is  nonneg¬ 

ative  over  its  specified  domain,  it  is  not  necessary  to  normalize  the  function.  Notice 
also  that  an  analytical  formula  is  not  necessary  for  this  algorithm.  Indeed,  f(x) 
could  be  an  arbirarily  complex  computer  program.  As  long  as  it  returns  a  real  value 
in  the  range  [ymin,  ymax]>  it  is  suitable  as  a  generator  of  a  random  number  distribution. 


Examples  of  a  user-specified  bimodal  probability  density  and  the  corresponding  distribution  are  shown  in  Figures  55 
and  56,  respectively.  Note  that  it  is  not  necessary  to  have  knowledge  of  F(x),  only  f(x)  and  that  the  function  f(x) 
can  be  arbitarily  complex. 


Figure  55.  User-Specified  Density  Function. 


Figure  56.  User-Specified  Distribution  Function. 
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5.1.27  Weibull 


Density  Function: 


Distribution  Function: 

Input: 

Output: 

Mode: 

Median: 

Mean: 

Variance: 

Regression  Equation: 
Algorithm: 

Source  Code: 


Notes: 


/«= 


C  ,x~asc 

- (—7—)  exp 

x-a  b 


b 


x  >  a 

otherwise 


1  -  exp 


x  >  a 


0  otherwise 


Location  a ,  any  real  number;  scale  b  >  0;  shape  c  >  0 
x  e  [a,  oo) 

j  a  +  b(l  -  l/c),/c  if  c  >  1 
|  a  if  c  <  1 

a  +  b(la2)l,c 

a  +  bT[{c  +  \)lc] 

b\T[{c  +  2)1  c]  -  (r[(c  +  l)/c])2) 

ln[-  In  (1  -  F,)]  =  c  In  (*,-  -  a)  -  c  In  b, 

where  the  are  arranged  in  ascending  order,  F,  =  i/N,  and  i  =  1,2,  •  •  • ,  N 

(1)  Generate  U  ~  U( 0, 1) 

(2)  Return  X  =  a  +  b(-\nU)i/c 


double  weibull (  double  a,  double  b,  double  c  ) 

{ 

assert(  b  >  0.  &&  c  >  0.  ); 

return  a  +  b  *  pow(  -log(  uniform(  0.,  1.  )  ),  1.  /  c  ); 


(1)  When  c  =  1,  this  becomes  the  exponential  distribution  with  scale  b . 

(2)  When  c  =  2  for  general  b,  it  becomes  the  Rayleigh  distribution. 


Examples  of  probability  density  functions  and  cumulative  distribution  functions  are  shown  in  Figures  57  and  58, 
respectively. 


Figure  57.  Weibull  Density  Functions. 


Figure  58.  Weibull  Distribution  Functions. 
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5.2  Discrete  Distributions 

The  discrete  distributions  make  use  of  one  or  more  of  the  following  parameters. 
p  -  the  probability  of  success  in  a  single  trial. 

n  -  the  number  of  trials  performed  or  number  of  samples  selected. 

k  -  the  number  of  successes  in  n  trials  or  number  of  trials  before  first  success. 

N  -  the  number  of  elements  in  the  sample  (population). 

K  -  the  number  of  successes  contained  in  the  sample. 
m  -  the  number  of  distinct  events. 
ju  -  the  success  rate. 
i  -  smallest  integer  to  consider. 
j  -  largest  integer  to  consider. 

To  aid  in  selecting  an  appropriate  distribution.  Table  2  summarizes  some  characteristics  of  the  discrete  distributions. 
The  subsections  that  follow  describe  each  distribution  in  more  detail. 


Table  2.  Parameters  and  Description  for  Selecting  the  Appropriate  Discrete  Distribution 


Distribution  Name 

Parameters 

Output 

Bernoulli 

P 

success  (1)  or  failure  (0) 

Binomial 

n  and  p 

number  of  successes  (0  <  k  <  n) 

Geometric 

P 

number  of  trials  before  first  success  (0  <  k  <  oo) 

Hypergeometric 

n ,  N,  and  K 

number  of  successes  (0  <  k  <  min  ( n ,  K )) 

Multinomial 

n,m,  p\,  •  ■  • ,  pm 

number  of  successes  of  each  event  (1  <  kt  <  m) 

Negative  Binomial 

p  and  K 

number  of  failures  before  K  accumulated  successes  (0  <  k  <  oo) 

Pascal 

p  and  K 

number  of  trials  before  K  accumulated  successes  (1  <  k  <  oo) 

Poisson 

M 

number  of  successes  (0  <  k  <  oo) 

Uniform  Discrete 

i  and  j 

integer  selected  ( i<k<  j ) 
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5.2.1  Bernoulli 


A  Bernoulli  trial  is  the  simulation  of  a  probabilistic  event  with  two  possible  outcomes:  success  (X  =  1)  or  failure 
(X  =  0),  where  the  probability  of  success  on  a  single  trial  is  p.  It  forms  the  basis  for  a  number  of  other  discrete  dis¬ 
tributions. 


Density  Function: 

'w' 

II 

^3  — 

1 

0 

1 

Distribution  Function: 

0  <  *  <  1 

k>  1 

Input: 

Probability  of  event. 

p,  where  0  <  p  <  1 

Output: 

i 

t  €{0,1} 

0  if  p  <  1/2 

Mode: 

< 

0, 1  if  1/2 

1  if  p  >  1/2 

Mean 

Variance: 

Maximum  Likelihood: 
Algorithm: 


P 


p{\-p) 

p  =  X,  the  mean  value  of  the  IID  Bernoulli  variates 
(1)  Generate  U  ~  U( 0, 1) 


(2)  Return  X  = 


1  ifU<p 
0  HU  >  p 


Source  Code: 


Notes: 


bool  bernoulli(  double  p  ) 

C 

assert(  0.  <=  p  &&  p  <=  1.  ); 
return  uniform(  0.,  1.  )  <  p; 

} 

(1)  Notice  that  if  p  is  strictly  zero,  then  the  algorithm  above  always  returns  X  =  0, 
and  if  p  is  strictly  one,  it  always  returns  X  =  1,  as  it  should. 

(2)  The  sum  of  n  IID  Bernoulli  variates  generates  a  binomial  distribution. 

Thus,  the  Bernoulli  distribution  is  a  special  case  of  the  binomial  distribution 
when  the  number  of  trials  is  one. 

(3)  The  number  of  failures  before  the  first  success  in  a  sequence  of  Bernoulli  trials 
generates  a  geometric  distribution. 

(4)  The  number  of  failures  before  the  first  n  successes  in  a  sequence  of  Bernoulli 
trials  generates  a  negative  binomial  distribution. 

(5)  The  number  of  Bernoulli  trials  required  to  produce  the  first  n  successes 
generates  a  Pascal  distribution. 
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5.2.2  Binomial 

The  binomial  distribution  represents  the  probability  of  k  successes  in  n  independent  Bernoulli  trials,  where  the  prob¬ 
ability  of  success  in  a  single  trial  is  p . 


Mean:  np 

Variance:  np(l-p) 

Maximum  Likelihood:  p  -  XI n ,  where  X  is  the  mean  of  the  random  variates 

Algorithm:  ( 1 )  Generate  n  IID  Bernoulli  trials  X/  -  Bemoulli(p),  where  i  =  1 ,  •  •  • ,  n 

(2)  Return  X  =  X\  +  •  •  •  +  Xn 

Source  Code:  int  binomial  (  int  n,  double  p  ) 

assert (  0.  <=  p  &&  p  <=  1.  &&  n  >=  1  ); 
int  sum  “  0? 

for  (  int  i  =  0;  i  <  n;  i++  )  sum  +=  bernoulli(  p  ); 
return  sum? 

} 

Notes:  (1)  The  binomial  reduces  to  the  Bernoulli  when  n  =  1. 

(2)  Poisson  (np)  approximates  binomial  ( n ,  p)  when  p  «  1  and  n  »  1. 

(3)  For  large  n,  the  binomial  can  be  approximated  by  N(np,np),  provided  np>  5 
and  0. 1  <  p  <  0. 9  —  and  for  all  values  of  p  when  np  >  25. 

Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  59  and 
60,  respectively. 


0123456789  10  0123456789  10 

Number  of  Successes,  k  Number  of  Successes,  k 


Figure  59.  Binomial  Density  Function.  Figure  60.  Binomial  Distribution  Function. 
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5.2.3  Geometric 


The  geometric  distribution  represents  the  probability  of  obtaining  k  failures  before  the  first  success  in  independent 
Bernoulli  trials,  where  the  probability  of  success  in  a  single  trial  is  p.  Or,  to  state  it  in  a  slightly  different  way,  it  is 
the  probability  of  having  to  perform  k  trials  before  achieving  a  success  (i.e.,  the  success  itself  is  not  counted). 


Density  Function: 


/(*)  = 


Pd-P)k 

0 


k  e{0,lf— > 
otherwise 


Distribution  Function: 

Input: 

Output 

Mode: 

Mean: 

Variance 

Maximum  Likelihood: 
Algorithm: 


l-(l-p)*+1  *>0 

0  otherwise 

Probability  of  event,  p ,  where  0  <  p  <  1 
Number  of  trials  before  a  success  k  e  {0, 1 ,  •  •  •  } 

0 

(i  -p)/p 
a  -p)/p2 

p  =  1/(1  +  X),  where  X  is  the  mean  value  of  the  HD  geometric  variates 

(1)  Generate  V  ~  U{ 0, 1) 

(2)  Return  X  ~  int(lnl//ln(l  -  p)) 


Source  Code:  int  geometric  (  double  p  ) 

{ 

assert (  0.  <  p  &&  p  <  1.  ); 

return  int(  log(  uniform(  0.,  1.  )  )  /  log(  1.  -  p  )  ); 

1 


Notes:  (1)  A  word  of  caution:  There  are  two  different  definitions  that  are  in  common  use 

for  the  geometric  distribution.  The  other  definition  is  the  number  of  failures  up 
to  and  including  the  first  success. 

(2)  The  geometric  distribution  is  the  discrete  analog  of  the  exponential  distribution. 

(3)  If  Xx ,  X2,  ■  *  *  is  a  sequence  of  independent  Bernoulli  ( p )  random  variates  and 
X  =  min  {i  3  Xt  =  1}- 1,  then  X  -  geometric  (p). 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  61  and 
62,  respectively. 


Figure  61*  Geometric  Density  Function. 


Figure  62.  Geometric  Distribution  Function. 
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5.2.4  Hypergeometric 

The  hypergeometric  distribution  represents  the  probability  of  k  successes  in  n  Bernoulli  trials,  drawn  without 
replacement ;  from  a  population  of  N  elements  that  contains  K  successes. 


Density  Function: 


Distribution  Function: 


KYN-K 
k  a  n-k 


,  where  = - : - is  the  binomial  coefficient 

\k )  k\(n-k)\ 


KYN-K 
k  k  n-i 


,  where  0  <  k  <  min  (K,  n) 


Input: 

Output: 

Mean: 


Number  of  trials,  n;  population  size,  N;  successes  contained  in  the  population,  K 
The  number  of  successes  k  e  {0, 1 , • ■ *  min  ( K ,  n) } 
np9  where  p  =  KIN 


Variance: 


Source  Code: 


1 

int  hypergeometric {  int  n,  int  N,  int  K  ) 
t 

assert(  0  <=  n  &&  n  <=  N  ); 
assert (  N>=l&&K>-0); 

int  count  =  0? 

for  (  int  i  =  0;  i  <  n?  i++,  N--  )  { 

double  p  =  double (  K  )  /  double (  N  ) ; 
if  (  bernoulli(  p  )  )  {  count++;  K--;  } 

} 

return  count; 


Notes: 


hypergeometric  (n,  N9  K)  «  binomial  («,  KIN),  provided  n/N  <  0. 1 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  63  and 
64,  respectively. 

f(x)  «  =  6  1-1  F\x)  | 


n  =  6 
N  =  10 
K  =  4 


Figure  63.  Hypergeometric  Density  Function. 


Figure  64.  Hypergeometric  Distribution  Function. 
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5.2.5  Multinomial 

The  multinomial  distribution  is  a  generalization  of  the  binomial  so  that  instead  of  two  possible  outcomes,  success  or 
failure,  there  are  now  m  disjoint  events  that  can  occur,  with  corresponding  probability  ph  where  i  e  1,2,  •  •  • ,  m,  and 
where  p{  +  p2  +  •  •  •  +  pm  -  1  •  The  density  function  represents  the  probability  that  event  1  occurs  kx  times,  •  •  •,  and 
event  m  occurs  km  times  in  k\  H - 1 -km-n  trials. 


Density  Function: 


f(kuk2,---,kj  = 


n! 


k,\k2\- 


■*«! 


*1  *2 
Pi  Pz 


=  n\ 


m  7,*' 

IT  — 

f=i  h'- 


Input: 


Output: 

Algorithm: 


Source  Code: 


Notes: 


Number  of  trials,  n  >  1; 

number  of  disjoint  events,  m>  2; 

probability  of  each  event,  ph  with  Pi  +  •  •  •  +  pm  =  1 

Number  of  times  each  of  the  m  events  occurs,  kt  e  {0,  *  •  • ,  n  >, 
where  i  =  1,  •  •  • ,  m,  and  k{  +  •  *  •  +  km  =  n 

The  multinomial  distribution  is  obtained  through  simulation. 

(1)  Generate Ut  ~  t/( 0,  l)forz  =  1  ,•**,« 

(2)  For  each  Uh  locate  probability  subinterval  that  contains  it  and  increment  counts 


void  multinomial (  int  n, 

double  p [ ] , 
int  count [ ] , 

int  m  ) 


{ 


//  trials  n 

//  probability  vector  p, 

//  success  vector  count, 

//  number  of  disjoint  events  m 


assert(  m  >=  2  ) ; 
double  sum  =  0 . ; 

for  (  int  bin  =  0;  bin  <  m;  bin++  )  sum  +=  p[  bin  ]; 
assert(  sum  ==  1 .  ); 

for  (  int  bin  =  0;  bin  <  m;  bin++  )  count [  bin  ]  =  0; 
//  generate  n  uniform  variates  in  the  interval  [0,1) 
for  (  int  i  =  0;  i  <  n;  i++  )  { 


double  lower  =  0 . ,  upper  =  0 . ,  u  =  _u ( ) ; 

for  (  int  bin  =  0?  bin  <  m;  bin++  )  £ 

//  locate  subinterval,  of  length  p[  bin  ], 
//  that  contains  the  variate  and 
//  increment  corresponding  counter 


lower  =  upper; 
upper  +=  p [  bin  ] ; 

if  (  lower  <=  u  &&  u  <  upper  )  {  count [  bin  ]++;  break;  } 

} 

J 

} 

The  multinomial  distribution  reduces  to  the  binomial  distribution  when  m-  2. 
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5.2.6  Negative  Binomial 

The  negative  binomial  distribution  represents  the  probability  of  k  failures  before  the  sth  success  in  a  sequence  of 
independent  Bernoulli  trials,  where  the  probability  of  success  in  a  single  trial  is  p. 


Density  Function: 

Distribution  Function: 

Input: 

Output: 


(s  +  k-l)\ 
k\(s- 1)! 


ps(l-p)k  ke{ 0,  !,•••} 


otherwise 


y p\i-p)‘  x>o 
F(k)  =  \  h  iKs-iy. 

0  otherwise 


Probability  of  event,  p,  where  0  <  p  <  1;  number  of  successes  s  >  1 
The  number  of  failures  k  e  {0, 1,  •  •  • } 


j  y  and  y  + 1  if  y  is  an  integer 
lint  (y)+l  otherwise 

where  y  =  [s(l  -  p)  -  l]/p  and  int(y)  is  the  smallest  integer  <  y 


Mean:  $(1  -  p)lp 

Variance:  s(l  -  p)/p2 

Maximum  Likelihood:  p  =  s/(s  +  X),  where  X  is  the  mean  value  of  the  IID  variates 

Algorithm:  This  algorithm  is  based  on  the  convolution  formula. 

(1)  Generate  s  IID  geometric  variates,  X,-  ~  geometric  (p) 

(2)  Return  X  =  Xi  H - 1-  Xs 


Source  Code:  int  negativeBinomial(  int  s,  double  p  ) 

C 

assert(  s  >=  1  ) ? 
int  sum  =  0; 

for  (  int  i  =  0  ?  i  <  s ;  i++  )  sum  +=  geometric (  p  ) ; 
return  sum; 

} 

Notes:  (1)  If  Xj,  •  •  • ,  Xs  are  geometric  (p)  variates,  then  the  sum  is  negativeBinomial  (s,  p). 

(2)  The  negativeBinomial  (1,  p)  reduces  to  geometric  (p). 

Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  65  and 
66,  respectively. 


Figure  65.  Negative  Binomial  Density  Function.  Figure  66.  Negative  Binomial  Distribution  Function. 
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5.2.7  Pascal 

The  Pascal  distribution  represents  the  probability  of  having  to  perform  k  trials  in  order  to  achieve  s  successes  in  a 
sequence  of  n  independent  Bernoulli  trials,  where  the  probability  of  success  in  a  single  trial  is  p . 


Density  Function: 


Distribution  Function: 

Input: 

Output 

Mode: 

Mean 

Variance 

Maximum  Likelihood: 
Algorithm: 

Source  Code: 


Notes: 


/(*)=■ 

F(k)  =  < 


(*  —  !)! 


ps{\-p)k-s 


k  €  {s,s  +!,-••  } 


0 

y  (*-D! 

0 


otherwise 

p\l-pts  k>  s 

otherwise 


Probability  of  event,  p,  where  0  <  p  <  1;  number  of  successes,  s  >  1 
The  number  of  trials  k  e  {s,  s  + 1,  •  •  •  } 

The  integer  tt  that  satisfies  l+np>  s>\  +  (rt-l)p 
sip 

j(1  -  p)/p2 

p  =  sin ,  where  n  is  the  number  of  trials  [unbiassed  estimate  is  ( s  -  1)1  (n  -  1)] 

This  algorithm  takes  advantage  of  the  logical  relationship  to  the  negative  binomial. 
Return  X  =  negativeBinomial  (5,  p)  +  s 


int  pascal (  int  s,  double  p  ) 

{ 

return  negativeBinomial (  s,  p,  )  +  s; 

} 

(1)  The  Pascal  and  binomial  are  inverses  of  each  other  in  that  the  binomial  returns 
the  number  of  successes  in  a  given  number  of  trials,  whereas  the  Pascal  returns 
the  number  of  trials  required  for  a  given  number  of  successes. 

(2)  Pascal  ( s ,  p)  =  negativeBinomial  (5,  p)  +  s . 

(3)  Pascal  ( p ,  1)  =  geometric  (p)  + 1. 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  67  and 
68,  respectively. 


Figure  67.  Pascal  Density  Function. 


Figure  68.  Pascal  Distribution  Function. 
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5.2.8  Poisson 

The  Poisson  distribution  represents  the  probability  of  k  successes  when  the  probability  of  success  in  each  trial  is 
small  and  the  rate  of  occurrence,  //,  is  constant. 

k  €.  {0, 1 , •  •  •  > 

Density  Function:  k\ 

J  0  otherwise 


x4e_/'  k*° 

Distribution  Function:  F(k)  =  •{  i  =  o  1  • 

0  otherwise 


Input:  Rate  of  occurrence,  p  >  0 

Output:  The  number  of  successes  k  e  {0, 1 ,  •  •  • } 


Mode: 


f  p  - 1  and  p  if  p  is  an  integer 
1  int  (ft)  otherwise 


Mean:  p 

Variance:  P 

Algorithm:  (1)  Set  a  =  e~M,  b  =  1 ,  and  i  =  0 

(2)  Generate  UM  ~  1/(0, 1)  and  replace  b  by  b\JM 

(3)  If  b  <  a,  return  X  =  i;  otherwise,  replace  i  by  i  +  1  and  go  back  to  step  2 

Source  Code:  int  poisson (  double  mu  ) 

{ 

assert (  mu  >  0.  ); 
double  b  =  1 . ; 
int  i; 

for  (  i  =  0;  b  >=  exp(  -mu  );  i++  )  b  *=  uniform(  0.,  1.  ); 
return  i  -  1? 

} 

Notes:  (1)  The  Poisson  distribution  is  the  limiting  case  of  the  binomial  distribution  as 

n  -»  oo,  p  ->  0  and  np  p:  binomial  (n,  p )  *  Poisson  (p),  where  p  =  np. 

(2)  For  p  >  9,  Poisson  (p)  may  be  approximated  with  N(p,  p),  if  we  round  to 
the  nearest  integer  and  reject  negative  values. 

Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  69  and 
70,  respectively. 


Figure  69.  Poisson  Density  Function.  Figure  70.  Poisson  Distribution  Function. 
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5.2.9  Uniform  Discrete 

The  Uniform  Discrete  distribution  represents  the  probability  of  selecting  a  particular  item  from  a  set  of  equally  prob¬ 
able  items. 


Density  Function: 


Distribution  Function: 

Input: 

Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


Notes: 


/w= 


i 


**min  1 


max  lmin 
0 


^  e  {*min’  '*'» *max } 

otherwise 


F(k)  = 


_  *min  +  1  i  .  <k<i 

_  .  .  ‘min  —  ^  —  ‘max 

*max  *min  * 


k>L 


Minimum  nteger,  imin;  maximum  integer,  zmax 

^  ^  {*min>  *  *  * » *max  } 

Does  not  uniquely  exist,  as  all  values  in  the  domain  are  equally  probable 

2  (*min  +  *max) 

1  9 

t^'max  —  *min  +  1 )  “  1 1 

(1)  Generate  U  ~  U( 0, 1) 

(2)  Return  X  —  zmin  +  int  ( [zmax  —  Zmin  +  1]  U ) 


int  uniformDiscrete(  int  i,  int  j  ) 

{ 

assert(  i  <  j  ) ? 

return  i  +  int(  (  j  -  i  +  1  )  *  uniform*  0.,  1.  )  )? 

3 

(1)  The  distribution  uniformDiscrete(0, 1)  is  the  same  as  Bernoulli  (1/2). 

(2)  Uniform  Discrete  distribution  is  the  discrete  analog  of  the  uniform  distribution. 


Examples  of  the  probability  density  function  and  the  cumulative  distribution  function  are  shown  in  Figures  71  and 
72,  respectively. 


k 


k 


Figure  71.  Uniform  Discrete  Density  Function. 


Figure  72.  Uniform  Discrete  Distribution  Function. 
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5.3  Empirical  and  Data-Driven  Distributions 

The  empirical  and  data-driven  distributions  make  use  of  one  or  more  of  the  following  parameters. 
x  -  data  point  in  a  continuous  distribution. 

F  -  cumulative  distribution  function  for  a  continuous  distribution. 
k  -  data  point  in  a  discrete  distribution. 

p  -  probability  value  at  a  discrete  data  point  for  a  discrete  distribution. 

P  -  cumulative  probability  for  a  discrete  distribution. 

To  aid  in  selecting  an  appropriate  distribution.  Table  3  summarizes  some  characteristics  of  these  distributions.  The 
subsections  that  follow  describe  each  distribution  in  more  detail. 


Table  3.  Parameters  and  Description  for  Selecting  the  Appropriate  Empirical  Distribution 


Distribution  Name 

Input 

Output 

Empirical 

Empirical  Discrete 

Sampling  With  and  Without  Replacement 

Stochastic  Interpolation 

file  of  (*,,  Ft)  data  pairs 

file  of  (kh  data  pairs 

file  of  ki  data 

file  of  2-D  data  points  y,) 

interpolated  data  point  x 

selection  of  a  data  point  k 

selection  of  a  data  point  k 

new  2-D  data  point  (*,  y) 
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5.3.1  Empirical 

Distribution  Function: 


Input: 


Output: 

Algorithm: 


Source  Code: 


Notes: 


The  distribution  function  is  specified  at  a  number  of  distinct  data  points  and  is  lin¬ 
early  interpolated  at  other  points: 

F(x)  =  F(xi)  +  [F(xi+{)  -  F(x()]  for  xt<x<  xi+i , 

xi+l  -  Xi 

where  xh  i  =  0, 1,  •  *  •  n  are  the  data  points,  and  F(x()  is  the  cumulative  probability  at 
the  point  x*. 

We  assume  that  the  empirical  data  is  in  the  form  of  a  histogram  of  n  +  1  pairs  of  data 
points  along  with  the  corresponding  cumulative  probability  value: 


*0 

Fix  o) 

*1 

Fix ,) 

*2 

F{x2) 

Xn 

F(xn ) , 

where  F(jc0)  =  0,  F{xn)  =  1,  and  F(xt)  <  F(xI+1).  The  data  points  are  required  be  in 
ascending  order  but  need  not  be  equally  spaced  and  the  number  of  pairs  is  arbitrary. 

X  €:  [*0,  Xn) 

This  algorithm  works  by  the  inverse  transform  method. 

(1)  Generate  U  ~  f/(0, 1) 

(2)  Locate  index  i  such  that  F(xt)  <  U  <  F(xi+{) 

(3)  Return  X  =  „  +  (%,-  *,) 

double  empirical (  void  ) 

{ 

static  vector <  double  >  x,  cdf? 

static  int  n; 

static  bool  init  =  false; 

if  (  !init  )  { 

ifstream  in(  "empiricalDistribution"  ); 
if  (  tin  )  { 

cerr  <<  "Cannot  open  \"empiricalDistribution\"  file"  <<  endl; 
exit(  1  )? 

} 

double  value,  prob; 

while  (  in  >>  value  >>  prob  )  {  //  read  in  empirical  data 

x.push_back(  value  ); 
cdf .push_back(  prob  )? 

} 

n  =  x.size( ) ; 
init  =  true; 

//  check  that  this  is  indeed  a  cumulative  distribution 
assert (  0.  ==  cdf[  0  ]  &&  cdf[  n  -  1  ]  ==  1 .  ) ; 

for  (  int  i  =  1;  i  <  n;  i++  )  assert (  cdf [  i  -  1  ]  <  cdf [  i  ]  ); 

} 

double  p  =  uniform (  0.,  1.  ); 
for  (  int  i  =  0 ;  i  <n-  1;  i++  ) 

if  (  cdf[  i  ]  <=  p  &&  p  <  cdf[  i  +  1  ]  ) 

return  x[i]+(x[i+i]-x[ij  )*(p-  cdf[  i  ]  )  / 

(  cdf [  i  +  1  ]  -  cdf [  i  ]  ) ; 

return  x[  n  -  1  ] ; 


(1)  The  data  must  reside  in  a  file  named  empiricalDistribution. 

(2)  The  number  of  data  pairs  in  the  file  is  arbitrary  (and  is  not  a  required  input  as  the 
code  dynamically  allocates  the  memory  required). 
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5.3.2  Empirical  Discrete 

Density  Function: 

Distribution  Function: 

Input: 

Output: 

Algorithm: 

Source  Code: 


Notes: 


This  is  specified  by  a  list  of  data  pairs,  (kh  pi),  where  each  pair  consists  of  an  integer 
data  point,  kit  and  the  corresponding  probability  value,  p{. 

F(kj)  =  E  A  =  Pj 

i  =  1 

Data  pairs  pi),  where  i  =  1,2,  The  data  points  must  be  in  ascending  order 

by  data  point  but  need  not  be  equally  spaced  and  the  probabilities  must  sum  to  one: 

n 

kt  <  kj  if  and  only  if  i  <  j  and  E  p,  =  1- 


x  €  {kl,k2,---,kn} 

(1)  Generate  U  ~U( 0, 1) 

7-i  J 

(2)  Locate  index  j  such  that  E  p,-  <  U  <  E  A 

i  =  1  i  =  I 

(3)  Return  X  =  kj 

int  empiricalDiscrete (  void  ) 

static  vector<  int  >  k?  x _  i  i 

static  vector <  double  >  f[  2  ];  //  pdf  is  f[  0  ]  and  cdf  xs  f[  1  ] 

static  double  max; 

static  int  n; 

static  bool  init  =  false; 


if 


(  Unit  )  { 

if stream  in  (  "empiricalDiscrete"  ) ; 

if  (  lin  )  t 

cerr  <<  "Cannot  open  \"empiricalDxscrete\" 
exit(  1  ) ; 


file"  «  endl; 


} 

int  value; 

double  freq?  . 

while  (  in  >>  value  >>  freq  )  {  //  read  xn  empirical  data 

k.push_back(  value  ); 
f[  0  ] .push_back(  freq  ); 


} 

n  =  k.size( ) ; 


init  =  true; 


//  form  the  cumulative  distribution 


f[  1  ].push_back(  f[  0  ][  0  ]  ); 
for  (  int  i  -  1;  i  <  n;  i++  ) 

f[  l  ].push_back(  f[l][i-l]+f[0][x 


)? 


//  check  that  the  integer  points  are  in  ascending  order 
for  (  int  i  =  1;  i  <  n;  i++  )  assert<  k[  i  -  1  ]  <  k[  i  ]  )? 


max  =f[l][n-l]; 

} 

//  select  a  uniform  random  number  between  0  and  the  maximum  value 
double  p  =  uniform(  0.f  max  ); 

//  locate  and  return  the  corresponding  index 

for  (  int  i  =  0;  i  <  n;  it+  )  if  (  p  <=  f[  1  1!  i  1  )  return  k[  i  ]; 
return  k[  n  -  1  ] ; 


(1)  The  data  must  reside  in  a  file  named  empiricalDiscrete. 

(2)  The  number  of  data  pairs  in  the  file  is  arbitrary  (and  is  not  a  required  input  as  the 
code  dynamically  allocates  the  memory  required). 
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As  an  example,  consider  the  following  hypothetical  empirical  data: 

2  0.2 

3  0.4 

5  0.1 

7  0.2 

9  0.1 

The  probability  density  function  and  cumulative  distribution  function  are  shown  in  Figures  73  and  74,  respectively. 


23456789  23456789 


k  ki 

Figure  73.  Discrete  Empirical  Density  Function.  Figure  74.  Discrete  Empirical  Distribution  Function. 
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5.3.3  Sampling  With  and  Without  Replacement 

Suppose  a  population  of  size  N  contains  K  items  having  some  attribute  in  common.  We  want  to  know  the  probabil¬ 
ity  of  getting  exactly  k  items  with  this  attribute  in  a  sample  size  of  n,  where  0  <k<n.  Sampling  with  replacement 
effectively  makes  each  sample  independent  and  the  probability  is  given  by  the  formula 


P(*)  = 


n\Kk{N -K)n~k 
k)  Nn 


where 


n\ 

k\(n~k)\  ‘ 


(48) 


(See  the  binomial  distribution  in  section  5.2.2.)  Let  the  data  be  represented  by  {xy,---,xN  }.  Then  an  algorithm  for 
sampling  with  replacement  is  as  follows. 

( 1 )  Generate  index  I  ~  UniformDiscrete  ( 1 ,  N ). 

(2)  Return  data  element 


And,  in  the  case  of  sampling  without  replacement,  the  probability  is  given  by  the  formula 

fKYN-K' 

>  _  U  k  w~*  - 

'N' 


P(k,n)  = 


where 


n\ 


k\(n-k)l 


(49) 


(See  the  hypergeometric  distribution  in  section  5.2.4.)  An  algorithm  for  this  case  is  as  follows. 

(1)  Perform  a  random  shuffle  of  the  data  points  {xu---,xN}.  (See  section  3.4.2  of  Knuth  [1969].) 

(2)  Store  the  shuffled  data  in  a  vector. 

(3)  Retrieve  data  by  sequentially  indexing  the  vector. 


The  following  source  code  implements  both  methods — i.e.,  sampling  with  and  without  replacement. 


double  sample (  bool  replace  *  true  ) 

{ 

static  vector <  double  >  v; 
static  bool  init  =  false? 
static  int  n; 
static  int  index  =0; 


//  Sample  w  or  w/o  replacement  from  a 
//  distribution  of  1-D  data  in  a  file 
//  vector  for  sampling  with  replacement 
//  flag  that  file  has  been  read  in 
//  number  of  data  elements  in  the  file 
//  subscript  in  the  sequential  order 


if  (  Jinit  )  { 

ifstream  in(  "sampleData"  )? 
if  (  !in  )  { 

cerr  <<  "Cannot  open 
exit(  1  ); 

} 

double  d; 

while  (  in  >>  d  )  v.push_back(  d  ); 
in. close ( ) ; 
n  =  v.size( ) ; 
init  =  true? 

if  {  replace  ==  false  )  {  //  sample  without  replacement 

//  shuffle  contents  of  v  once  and  for  all 

//  Eef :  Knuth,  D.  E.,  The  Art  of  Computer  Programming,  Vol.  2: 
//  Seminumerical  Algorithms.  London:  Addison-Wesley,  1969. 

for  (  int  i=n-l?i>0?i--){ 
int  j  =  int(  (  i  +  1  )  *  _u()  )? 
swap(  v[  i  ],  v[  j  ]  )? 

} 

} 

} 

//  return  a  random  sample 
if  (  replace  ) 

return  v[  UniformDiscrete (  0,  n  -  1  )  ]; 
else  { 

assert (  index  <  n  ) ; 
return  v [  index++  ] ; 

} 

} 


//  sample  w/  replacement 

//  sample  w/o  replacement 
//  retrieve  elements 
//  in  sequential  order 
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5.3.4  Stochastic  Interpolation 

Sampling  (with  or  without  replacement)  can  only  return  some  combination  of  the  original  data  points.  Stochastic 
interpolation  is  a  more  sophisticated  technique  that  will  generate  new  data  points.  It  is  designed  to  give  the  new  data 
the  same  local  statistical  properties  as  the  original  data  and  is  based  on  the  following  algorithm. 


(1)  Translate  and  scale  multivariate  data  so  that  each  dimension  has  the  same  range: 

x  =>  (x  —  xmjn)/lxmax  —  Xmjnl. 

(2)  Randomly  select  (with  replacement)  one  of  the  n  data  points  along  with  its  nearest  m  -  1  neighbors  x1?  •  •  * ,  xm_! 
and  compute  the  sample  mean: 

1  m 

x  =  —  £  X/. 


m 


i  =  l 


(3)  Generate  m  IID  uniform  variates 


and  set 


Ui~U 


Al-V3(m-1)  1+V3(m-1)N 


m 


m 


X  =  x+  '£(xi-x)Ui. 

i=i 

(4)  Rescale  X  by  (xmax  -  xmin)  and  shift  to  xmin. 

The  following  source  code  implements  stochastic  interpolation. 

//  comparison  functor  for  use  in  determining  the  neighborhood  of  a  data  point 

struct  dSquared  :  public  binary_function<  point,  point,  bool  >  { 
bool  operator()(  point  p,  point  q  )  [ 

return  p.x  *  p.x  +  p.y  *  p.y  <  q.x  *  q.x  +  q.y  *  q.y; 

} 

} ; 


point  stochastic!nterpolation(  void  ) 


//  Kefs:  Taylor,  M.  S.  and  J.  R.  Thompson,  Computational  Statistics  &  Data 
//  Analysis,  Vol,  4,  pp.  93-101,  1986;  Thompson,  J.  R.,  Empirical  Model 

//  Building,  pp.  108-114,  Wiley,  1989;  Bodt,  B.  A.  and  M.  S.  Taylor, 

//  A  Data  Based  Random  Number  Generator  for  A  Multivariate  Distribution  - 

//  A  User's  Manual,  ARBRL-TR-02439,  BR1,  APG,  MD,  Nov.  1982. 

C 


static  vector <  point  >  data; 
static  point  min, 

static  int  m; 

static  double  lower 

static  bool  init 


max; 

,  upper; 
=  false; 


if  (  linit  )  ( 

if stream  in(  "stochasticData"  ); 
if  (  !in  )  { 

cerr  <<  "Cannot  open  \nstochasticData\"  input  file"  <<  endl; 
exit(  1  ) ; 


//  read  in  the  data  and  set  min  and  max  values 


min.x  =  min.y  =  FLT_MAX ; 
max.x  =  max.y  =  FLTJMIN; 
point  p; 

while  (  in  >>  p.x  >>  p.y  )  { 


min.x 

min.y 

max.x 

max.y 


(  p . x  <  min .x  ?  p.x  :  min . x  ) ; 
(  p.y  <  min.y  ?  p.y  :  min.y  ); 
(  p . x  >  max .x  ?  p.x  :  max . x  ) ; 
(  p.y  >  max.y  ?  p.y  :  max.y  ); 


data.push_back(  p  ); 

} 

in. close ( ) ; 
init  =  true; 
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//  scale  the  data  so  that  each  dimension  will  have  equal  weight 

for  (  int  i  =  0?  i  <  data.size();  i++  )  { 

data [  i  ].x  =  (  data(  i  ] .x  -  min.x  )  /  (  max.x  -  min.x  )? 

data [  i  ].y  =  <  data[  i  ] .y  -  min.y  )  /  (  max.y  -  min.y  ); 

} 

//  set  m,  the  number  of  points  in  a  neighborhood  of  a  given  point 

m  =  data .size ( )  /  20?  //  5%  of  all  the  data  points 

if  (  m  <  5  )  m  =  5;  //  but  no  less  than  5 

if  (  m  >  20  )  m  =  20?  //  and  no  more  than  20 

lower  =  (  1.  -  sqrt(  3.  *  (  double*  m  )  -  1.  )  )  )  /  double<  m  )? 

upper  =  (  1.  +  sqrt(  3.  *  (  double(  m  )  -  1.  )  )  )  /  double(  m  )? 


//  uni  form  random  selection  of  a  data  point  (with  replacement) 
point  origin  =  data[  uniforming  0,  data. size ()  -  1  )  ]? 

//  make  this  point  the  origin  of  the  coordinate  system 
for  (  int  n  —  0?  n  <  data.size()?  n++  )  data[  n  ]  -=  origin? 

//  sort  the  data  with  respect  to  its  distance  (squared)  from  this  origin 
sort(  data. begin ( ),  data.end(),  dSquared( )  )? 

//  find  the  mean  value  of  the  data  in  the  neighborhood  about  this  point 

point  mean? 

mean.x  =  mean.y  =  0.? 

for  (  int  n=0?n<m?  n++  )  mean  +=  data[  n  ] ? 
mean  /=  double (  m  ) ? 

//  select  a  random  linear  combination  of  the  points  in  this  neighborhood 

point  p? 

p.x  =  p.y  =  0.; 

for  (  int  n  =  0;  n  <  m?  n++  )  { 
double  ra? 

if  (  m  ==  1  )  rn  »  1 . ? 

else  rn  =  uniform(  lower,  upper  )? 

p.x  +=  rn  *  (  data[  n  ] .x  -  mean.x  )? 
p.y  +=  rn  *  (  data[  n  ] .y  -  mean.y  )? 


//  restore  the  data  to  its  original  form 

for  (  int  n  =  0?  n  <  data.size()?  n++  )  data[  n  ]  +=  origin? 

//  use  the  mean  and  the  original  point  to  translate  the  randomly -chosen  point 

p  +=  mean? 
p  +=  origin? 

//  scale  the  -randomly-chosen  point  to  the  dimensions  of  the  original  data 

p.x  =  p.x  *  (  max.x  -  min.x  )  +  min.x? 
p.y  -  p.y  *  (  max.y  -  min.y  )  +  min.y? 

return  p? 

} 

Notes:  (1)  Notice  that  with  the  particular  range  on  the  uniform  distribution  in  step  3  of  the  algorithm  is 

chosen  to  give  a  mean  value  of  1/m  and  a  variance  of  (m  -  l)/m2. 

(2)  When  m  =  1,  this  reduces  to  the  bootstrap  method  of  sampling  with  replacement. 
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5.4  Bivariate  Distributions 

The  bivariate  distributions  described  in  this  section  make  use  of  one  or  more  of  the  following  parameters, 
cartesiancoord  -  a  Cartesian  point  (jc,  y)  in  two  dimensions. 

sphericaicoord  -  the  angles  ( 6 ,  <f> ),  where  0  is  the  polar  angle  as  measured  from  the  z-axis,  and 

</>  is  the  azimuthal  angle  as  measured  counterclockwise  from  the  jc-axis. 

p  -  correlation  coefficient,  where  -1  <  p  <  1. 

To  aid  in  selecting  an  appropriate  distribution.  Table  4  summarizes  some  characteristics  of  these  distributions.  The 
subsections  that  follow  describe  each  distribution  in  more  detail. 


Table  4.  Description  and  Output  for  Selecting  the  Appropriate  Bivariate  Distribution 


Distribution  Name 

Description 

Output 

bivariateNormal 

normal  distribution  in  two  dimensions 

cartesianCoord 

bivariateUniform 

uniform  distribution  in  two  dimensions 

cartesianCoord 

corrNormal 

normal  distribution  in  two  dimensions  with  correlation 

cartes ianCoord 

corrUniform 

uniform  distributionin  two  dimensions  with  correlation 

cartesiancoord 

spherical 

uniform  distribution  over  the  surface  of  the  unit  sphere 

sphericaicoord 

sphericalND 

uniform  distribution  over  the  surface  of  the  N -dimensional  unit  sphere 

c Xi,---,xN ) 
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5.4.1  Bivariate  Normal  (Bivariate  Gaussian) 


Density  Function: 
Input: 

Output: 

Mode: 

Variance: 

Algorithm: 

Source  Code: 

Notes: 


1 

- - exp 

2K  GXOy 


[  Ux-fix?  (y-^y)2]] 

\[  2ax  JJ 


Location  parameters  (ftx,  ny),  any  real  numbers; 
scale  parameters  (ax,  oy),  any  positive  numbers 


x  e  (-oo,  oo)  and  y  e  (-oo,  oo) 


(Mx’My) 

(ol,ay) 

(1)  Independently  generate  X  ~  N( 0, 1)  and  Y  ~  N( 0, 1) 

(2)  Return  (fix  +  My  +  ayY) 


cartesianCoord  bivar iateNormal (  double  muX, 

double  muY, 


{ 

assert(  sigmaX  >  0.  &&  sigmaY  >0.  )? 


double  sigmaX, 
double  sigmaY  ) 


cartesianCoord  p; 
p.x  =  normal (  mux,  sigmaX  ); 
p.y  =  normal (  muY,  sigmaY  ); 
return  p? 


The  variables  are  assumed  to  be  uncorrelated.  For  correlated  variables,  use  the  cor¬ 
related  normal  distribution. 


Two  examples  of  the  distribution  of  points  obtained  via  calls  to  this  function  are  shown  in  Figures  75  and  76. 


Figure  75.  bivariateNormaI(  0.,  1^  0.,  1. ).  Figure  76.  bivariateNormalf  0.,  1.,  -1, 0.5 ). 
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5.4.2  Bivariate  Uniform 


Density  Function: 
Input: 

Output: 

Algorithm: 

Source  Code: 


f(x,y) 


_L  o<<*-.^2.  +  <y-yo)2<i 

nab  a2  b2  " 

0  otherwise 


[*min,  *maxX  bounds  along  x-axis;  [ymin,  ymax),  bounds  along  y-axis; 

Location  parameters  (x0,  y0),  where  x0  =  (xmin  +  xmax)/2  and  y0  =  (ymin  +  ymax)/2; 
scale  parameters  (a,  b),  where  a  =  (xmax  -  xmin)/2  and  b  =  (ymax  -  ymin)/2  are  derived 


Point  (x,  y)  inside  the  ellipse  bounded  by  the  rectangle  [xmin,  xraax]  x  [ymin,  ymax] 


(1)  Independently  generate  X  ~U  (-1, 1)  and  Y  ~U  (-1, 1) 

(2)  If  X2  +  Y 2  >  1 ,  go  back  to  step  1 ;  otherwise  go  to  step  3 

(3)  Return  (xo  +  aX,yo  +  bY) 


cartes ianCoord  bivariateUnif6rm<  double  xMinr  double  xMax, 

double  yMinf  double  yMax  ) 
t 

assert (  xMin  <  xMax  &&  yMin  <  yMax  ); 


double 

xO  =  0.5 

*  (  xMin 

+ 

xMax 

)? 

double 

yO  =  0.5 

*  (  yMin 

+ 

yMax 

) ; 

double 

a  =0.5 

*  (  xMax 

- 

xMin 

); 

double 

b  =0.5 

*  (  yMax 

- 

yMin 

)? 

double 

x,  y; 

do  { 

x  = 

uniform( 

-l.r  I-  ) 

? 

y  - 

uniform ( 

"l.r  1-  ) 

? 

}  while  (  x  *  x  • 

*  y  *  y  > 

i 

-  >; 

cartesianCoord  p; 
p.x  =  xO  +  a  *  x? 
p.y  =  yO  +  b  *  y; 
return  p; 

} 


Notes:  Another  choice  is  to  use  a  bounding  rectangle  instead  of  a  bounding  ellipse. 

Two  examples  of  the  distribution  of  points  obtained  via  calls  to  this  function  are  shown  in  Figures  77  and  78. 


Figure  77.  bivariateUniform(  0.,  1.,  0.,  1. ). 


Figure  78.  bivariateUniform(  0.,  1.,  -1.,  0.5 ). 
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5.4.3  Correlated  Normal 


Density  Function: 
Input: 

Output: 

Mode: 

Variance: 

Correlation  Coefficient: 
Algorithm: 

Source  Code: 


1  (JC-^)2  p(x-  nx){y-  Hy)  |  (y  -  My)2  1 

1  -p2  2cr|  oxoy  2o\  Jj 


Location  parameters  (fix,  fiy),  any  real  numbers;  positive  scale  parameters  (ox,  cry); 
correlation  coefficient,  -1</?<1 


Point  ( x ,  y),  where  x  e  (-oo,  oo)  and  y  <=  (-oo,  oo) 


(Px,  Py) 

(c*Wy) 


P 

(1)  Independently  generate  X  ~  A^(0, 1)  and  Z  ~  AT(0, 1) 

(2)  Set Y  =  pX  +  JT^ftZ 

(3)  Return  (ux  +  oxX,  fiy  +  oyY) 

cartesianCoord  corrNormal(  double  rf  double  muX,  double  sxgmaXr 

double  rauY,  double  sigma Y  ) 

^  as Serb (  -1.  <=  r  &&  r  <=  1.  )?  //  bounds  on  corr  coeff 

assert (  sigmaX  >  0.  &&  sigmaY  >  0.  );  //  positive  std  dev 

double  x  =  normal ( ) ; 
double  y  -  normal ( ) ; 

y  =  r  *  x  +  sqrt (  1.  -  r  *  r  )  *  y;  //  correlate  the  variables 

cartesianCoord  p; 

p . x  —  muX  +  sigmaX  *  x;  //  translate  and  scale 

p.y  =  muY  +  sigmaY  *  y; 
return  p; 


Two  examples  of  the  distribution  of  points  obtained  via  calls  to  this  function  are  shown  in  Figures  79  and  80. 


-3-2-10123  -3-2-10123 


Figure  79.  corrNormal(  0.5, 0.,  1.,  0.,  1. ).  Figure  80.  corrNormal(  -0.75, 0.,  1.,  0.,  0.5  ). 
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5.4.4  Correlated  Uniform 


Input: 

Output: 

Algorithm: 

Source  Code: 


p,  correlation  coefficient,  where  -1  <  p  <  1;  [xmm,  xmax),  bounds  along  v-axis; 
timin’  JWX  bounds  along  y-axis 

Location  parameters  (x0,  y0),  where  x0  =  (xmin  +  xmax)/2  and  y0  =  (ymm  +  yma*)/ 2; 
scale  parameters  (a,  b),  where  a  =  (xmax  -  jcmin)/2  and  b  =  ( ymax  -  ymin)/2  are  derived 

Correlated  points  (x,  y)  inside  the  ellipse  bounded  by  the  rectangle  [xmin.  xmax]  x  [ymin,  ymax] 

(1)  Independently  generate  X  ~U(- 1, 1)  and  Z  ~  U(-l,  1) 

(2)  If  X2  +  Z2  >  1,  go  back  to  step  1 ;  otherwise  go  to  step  3 

(3)  Set Y  =  pX  +  yfT^Z 

(4)  Return  (jc0  +  aX ,  ;y0  +  bY) 

carte sianCoord  cor rUni form (  double  x,  double  xMin,  double  xMaxr 

double  yMin,  double  yMax  ) 

t 

assert(  -1.  <=  r  &&  r  <=  1.  ) ?  //  bounds  on  corx  coeff 

assert(  xMin  <  xMax  &&  yMin  <  yMax  ) ? 

double  xO  =  0 . 5  *  (  xMin  +  xMax  ) ; 

double  yO  =  0 . 5  *  (  yMin  +  yMax  ) ; 

double  a  =  0 . 5  *  (  xMax  -  xMin  ) ; 

double  b  =  0 . 5  *  (  yMax  -  yMin  ) ? 

double  x,  y; 

do  { 

x  =  uniform (  -l.f  1.  ); 
y  =  uni£orm(  -1.,  1.  ); 

}  while  (x*x+y*y>l.  ); 

y  s  r  *  x  +  sqrt(  1.  -r*r)*y;  //  correlate  variables 

cartesianCoord  p; 

p.x  =  xO  +  a  *  x;  //  translate  &  scale 

p.y  =  yO  +  b  *  y; 
return  p; 


Two  examples  of  the  distribution  of  points  obtained  via  calls  to  this  function  are  shown  in  Figures  81  and  82. 


0  0.5  1  0  0.5  1 


Figure  81.  corrUniform(  0.5, 0.,  1.,  0.,  1. ). 


Figure  82.  corrUniformf  -0.75, 0.,  1.,  0.,  0.5  ). 
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5.4.5  Spherical  Uniform 


Density  Function: 


Distribution  Function: 


sin  0 

(0max  —  0min)(cos  ^min  —  cos  ^max) 

(^-<»min)(COSgmin-COSg) 

(^max  —  0min)(co!*  ^min  —  c®s  ^max) 


for 


for 


[Q<9min<9<x 

{  0  ^  0min 

{O<0min<9<x 

\o<</>miD<<j><2x 


Input: 


Output: 

Mode: 

Mean: 

Variance: 

Algorithm: 

Source  Code: 


minimum  polar  angle  0min  >  0; 
maximum  polar  angle  0max  <  ?r; 
minimum  azimuthal  angle  0min  >  0; 
maximum  azimuthal  angle  0max  <  2 it 

(0, 4>)  pair,  where  9  e  [6»min,  0max]  and  ^  e  [0min, 

Does  not  uniquely  exist,  as  angles  are  uniformly  distributed  over  the  unit  sphere 

(  (^rmn  "t"  ^max  )2.  (0min  "t"  0maxV2  ) 

(  (9max  ~  9mm)  /12,  (0max  —  0min)  /12  ) 

(1)  Generate  Ut  ~  U (cos  0max ,  cos  9min)  and  U2  ~  U(</>m in.^max)- 

(2)  Return  0  =  cos ~l(,U{)  and  O  =  C/2. 

sphericalCoord  spherical (  double  thMin,  double  thMax, 

double  phMin,  double  phMax  ) 

^  assert{  0.  <=  thMin  &&  thMin  <  thMax  &&  thMax  <=  M_PI  && 

0.  <=  phMin  &&  phMin  <  phMax  &&  phMax  <=  2.  *  M_PI  ); 

sphericalCoord  p; 

p. polar  =  acos(  uniform(  cos(  thMax  ),  cos(  thMin  )  )  ); 

p. azimuth  =  uniform(  phMin,  phMax  ); 
return  p? 


Figure  83  shows  the  uniform  random  distribution  of  1,000  points  on  the  surface  of  the  unit  sphere  obtained  via  calls 
to  this  function. 


Figure  83.  Uniform  Spherical  Distribution  via  spherical(). 
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5.4.6  Spherical  Uniform  in  N-Dimensions 

This  will  generate  uniformly  distributed  points  on  the  surface  of  the  unit  sphere  in  n  dimensions.  Whereas  the  previ¬ 
ous  distribution  (5.4.5)  is  designed  to  return  the  location  angles  of  the  points  on  the  surface  of  the  three-dimensional 
unit  sphere,  this  distribution  returns  the  Cartesian  coordinates  of  the  points  and  will  work  for  an  arbitrary  number  of 
dimensions. 


Input: 

Output: 

Algorithm: 


Source  Code: 


Vector  X  to  receive  values;  number  of  dimensions  n 
Vector  X  of  unit  length  (i.e.,  X?  +  •  •  •  +  X\  -  1) 

(1)  Generate  n  IID  normal  variates  X! ,  •  •  • ,  X„  ~  N(0, 1) 

(2)  Compute  the  distance  from  the  origin,  d  =  -\/Xf  +  ---  +  X2 

(3)  Return  vector  X/d,  which  now  has  unit  length 

void  sphericalND (  double  x [ ] ,  //  x  array  returns  point 

int  n  )  //  n  is  number  of  dimensions 

{ 

//  generate  a  point  inside  the  unit  n- sphere  by  normal  polar  method 
double  r2  =  0 . ; 

for  (  int  i  =  0;  i  <  n;  i++  )  { 
x  [  i  ]  =  normal ( ) ; 
r2  +=  x[  i  ]  *  x[  i  ] ; 

} 

//  project  the  point  onto  the  surface  of  the  n- sphere  by  scaling 

const  double  A  =  1.  /  sqrt(  r2  ); 

for  <  int  i  =  0;  i  <  n;  i++  )  x[  i  ]  *=  A; 


Notes: 


(1)  When  n  =  1,  this  returns  {-  1,+1}. 

(2)  When  n  =  2,  it  generates  coordinates  of  points  on  the  unit  circle. 

(3)  When  n  =  3,  it  generates  coordinates  of  points  on  the  unit  3-sphere. 


5.5  Distributions  Generated  From  Number  Theory 

This  section  contains  two  recipes  for  generating  pseudo-random  numbers  through  the  application  of  number  theory: 

(1)  Tausworthe  or  Shift  Register  Generation  of  Random  Bits,  and 

(2)  Maximal  Avoidance  or  Sub-Random  Sequences. 

5.5.1  Tausworthe  Random  Bit  Generator 

Very  fast  random  bit  generators  have  been  developed  based  on  the  theory  of  Primitive  Polynomials  Modulo  Two 
(Tausworthe  1965).  These  are  polynomials  of  the  form 

Pn(x)  =  (Xn  +  an-t xn~l  +  •  •  •  +  axx +  1)  mod  2,  (50) 

where  n  is  the  order  and  each  coefficient  a,  is  either  1  or  0.  The  polynomials  are  prime  in  the  sense  that  they  cannot 
be  factored  into  lower  order  polynomials  and  they  are  primitive  in  the  sense  that  the  recurrence  relation 

an  =  ( xn  +  an^{xn"1  +--'  +  axx+  1)  mod  2  (51) 

will  generate  a  string  of  1  ’s  and  0’s  that  has  a  maximal  cycle  length  of  2*  —  1  (i.e.,  all  possible  values  excluding  the 
case  of  all  zeroes).  Primitive  polynomials  of  order  n  from  1  to  100  have  been  tabluated  (Watson  1962). 

Since  the  truth  table  of  integer  addition  modulo  2  is  the  same  as  “exclusive  or,”  it  is  very  easy  to  implement  these 
recurrence  relations  in  computer  code.  And,  using  the  separate  bits  of  a  computer  word  to  store  a  primitive  polyno¬ 
mial  allows  us  to  deal  with  polynomials  up  to  order  32,  to  give  cycle  lengths  up  to  4,294,967,295. 

The  following  code  is  overloaded  in  the  C++  sense  that  there  are  actually  two  versions  of  this  random  bit  generator. 
The  first  one  will  return  a  bit  vector  of  length  n ,  and  the  second  version  will  simply  return  a  single  random  bit  Both 
versions  are  guaranteed  to  have  a  cycle  length  of  2”  —  1 . 

Input:  Random  number  seed  (not  zero),  order  n  (1  <  n  <  32), 

and,  for  first  version,  an  array  to  hold  the  bit  vector 

Output:  Bit  vector  of  length  nor  a  single  bit  (i.e.,  1  or  0) 

Source  Code:  void  tausworthe  (  bool*  bitvec,  unsigned  n  )  //  returns  bit  vector  of  length  n 

//  It  is  guaranteed  to  cycle  through  all  possible  combinations  of  n  bits  ^ 

//  (except  all  zeros)  before  repeating,  i.e.,  cycle  is  of  maximal  length  2  n-1. 

//  Ref:  Press,  W.  H.,  B.  P.  Flannery,  S.  A.  Teukolsky  and  W.  T.  Vetterling, 

//  Numerical  Recipes  in  C,  Cambridge  Univ.  Press,  Cambridge,  1988. 

assert(  1  <=  n  &&  n  <=  32  ) ;  //  length  of  bit  vector 

if  (  _seed2  &  BIT[  n  ]  ) 

_seed2  =  (  (  _seed2  *  MASK[  n  ]  )  <<  1  )  |  BIT(  1  ]; 

else 

_seed2  <<=  1; 

for  (  int  i—0;  i<n;  i++  )  bitvec [  i  ]  —  _seed2  &  (  BIT[  n  ]  >>  i  ); 

1 

bool  tausworthe (  unsigned  n  )  //  returns  a  single  random  bit 

I 

assert(  1  <=  n  &&  n  <=  32  ) ; 
if  (  _seed2  &  BIT[  n  ]  )  { 

_seed2  =  (  (  _seed2  ~  MASK[  n  ]  )  «  1  )  |  BIT[  1  ]? 
return  true; 

} 

else  { 

__seed2  <<=  1; 
return  false? 

1 

} 

Notes:  (1 )  The  constants  used  in  the  above  source  code  are  defined  in  Random .  h. 

(2)  This  generator  is  3.6  times  faster  than  bernoulli  (  0.5  ). 


*  The  theory  underlying  these  techniques  is  quite  involved,  but  Press  et  al.  (1992)  and  sources  cited  therein  provide  a  starting  point. 


65 


5.5.2  Maximal  Avoidance  (Quasi-Random) 

Maximal  avoidance  is  a  technique  for  generating  points  in  a  multidimensional  space  that  are  simultaneously  self¬ 
avoiding,  while  appearing  to  be  random.  For  example,  the  first  three  plots  in  Figure  84  show  points  generated  with 
this  technique  to  demonstrate  how  they  tend  to  avoid  one  another.  The  last  plot  shows  a  typical  distribution  obtained 
by  a  uniform  random  generator,  where  the  clustering  of  points  is  apparent. 


1000  Maximal  Avoidance  Data  Points  1000  Uniformly  Distributed  Data  Points 

Figure  84.  Maximal  Avoidance  Compared  to  Uniformly  Distributed. 
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The  placement  of  points  is  actually  not  pseudo-random  at  all  but  rather  quasi-random,  through  the  clever  application 
of  number  theory.  The  theory  behind  this  technique  can  be  found  in  Press  et  al.  (1992)  and  the  sources  cited  therein, 
but  we  can  give  a  sense  of  it  here.  It  is  somewhat  like  imposing  a  Cartesian  mesh  over  the  space  and  then  choosing 
points  at  the  mesh  points.  By  basing  the  size  of  the  mesh  on  successive  prime  numbers  and  then  reducing  its  spacing 
as  the  number  of  points  increases,  successive  points  will  avoid  one  another  and  tend  to  fill  the  space  in  an  hierarchi¬ 
cal  manner.  The  actual  application  is  much  more  involved  than  this  and  uses  some  other  techniques  (such  as  primi¬ 
tive  polynomials  modulo  2,  and  Gray  codes)  to  make  the  whole  process  very  efficient.  The  net  result  is  that  it  pro¬ 
vides  a  method  of  sampling  a  space  that  represents  a  compromise  between  systematic  Cartesian  sampling  and  uni¬ 
form  random  sampling.  Monte  Carlo  sampling  on  a  Cartesian  grid  has  an  error  term  that  decreases  faster  than  N 
that  one  ordinarily  gets  with  uniform  random  sampling.  The  drawback  is  that  one  needs  to  know  how  many  Carte¬ 
sian  points  to  select  beforehand.  As  a  consequence,  one  usually  samples  uniform  randomly  until  a  convergence  cri¬ 
terion  is  met.  Maximal  avoidance  can  be  considered  as  the  best  of  both  of  these  techniques.  It  produces  an  error 
term  that  decreases  faster  than  N~in  while  at  the  same  time  providing  a  mechanism  to  stop  when  a  tolerance  crite¬ 
rion  is  met.  The  following  code  is  an  implementation  of  this  technique. 

double  avoidance (  void  )  //  1-dimension  (overloaded  for  convenience) 

C 

double  x [  1  ] ; 
avoidance (  x,  1  ) ? 
return  x [  0  ] ; 

void  avoidance (  double  x[] ,  int  ndim  )  //  multi-dimensional 

static  const  int  MAXBIT  «  30? 
static  const  int  MAXDIM  =6; 

assert(  ndim  <=  MAXDIM  ) ; 

static  unsigned  long  ix[  MAXDIM  +  1  ]  -  {  0  } ; 

static  unsigned  long  *u [  MAXBIT  +  1  ] ;  ...  _  .  . 

static  unsigned  long  mdeg[  MAXDIM  +!]*£//  degree  of  primitive  polynomial 
0,  1,  2,  3,  3 ,  4,  4 

static  unsigned  long  p[  MAXDIM  +  1  ]  ■  {  //  decimal  encoded  interior  bits 

0^  Of  If  If  2,  If  4 

static  unsigned  long  v[  MAXDIM  *  MAXBIT  +  1  ]  *  C 
Of  If  If  If  If  if  If 

3 f  If  3 r  3 r  If  If 

5,  7 f  7 r  3,  3 ,  5 , 

15,  11,  5,  15 ,  13,  9 

} ; 

static  double  fac; 
static  int  in  =  -1; 
int  j,  k; 

unsigned  long  i,  m,  pp? 

if  (  in  --  -1  )  { 
in  =  0? 

fac  =  1.  /  (  1L  «  MAXBIT  )? 

for  (  j  =  1,  k  *  0?  j  <=  MAXBIT;  j++,  k  +=  MAXDIM  )  u[  3  ]  *  *v[  k  ]? 

for  (  k  “  If  k  <=  MAXDIM?  k++  )  { 

for  (  j  =  1;  j  <=  mdeg[  k  };  j++  )  u[  j  ] [  k  ]  <<=  (  MAXBIT  -  3  )? 

for  (  j  =  mdegf  k  1  +  1;  j  <=  MAXBIT;  j++  )  { 

pp  =  p[  k  ]; 

i  =  u[  j  -  mdeg[  k  ]  ]  [  k  ] ; 
i  *=  (  i  >>  mdeg [  k  ]  )? 

for  (  int  n  =  mdeg[  k  ]  -  1?  n  >=  1;  n--  )  { 
if  (  pp  S  1  )  i  A=  u[  j  -  n  ][  k  ]? 
pp  >>=  I? 

} 

u[  j  ][  k  ]  -  if 

} 

} 

} 

m  =  in++; 

for  (  j  =  0?  j  <  MAXBIT?  j++,  m  >>=  1  )  if  (  »(  m  &  1  )  )  break; 
if  (  j  >=  MAXBIT  )  exit(  1  )? 
m  =  j  *  MAXDIM? 

for  (  k  =  0;  k  <  ndim;  k++  )  { 
ix[  k  +  1  ]  ~=v[  m  +  k  +  1  ]  ? 
x[  k  ]  =  ix[  k  +  1  ]  *  fac; 

} 

} 
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6. 


DISCUSSION  AND  EXAMPLES 


This  section  presents  some  example  applications  in  order  to  illustrate  and  facilitate  the  use  of  the  various  distribu¬ 
tions.  Certain  distributions,  such  as  the  normal  and  the  Poisson,  are  probably  over  used  and  others,  due  to  lack  of 
familiarity,  are  probably  under  used.  In  the  interests  of  improving  this  situation,  the  examples  make  use  of  the  less 
familiar  distributions.  Before  we  present  example  applications,  however,  we  first  discuss  some  differences  between 
the  discrete  distributions. 

6.1  Making  Sense  of  the  Discrete  Distributions 

Due  to  the  number  of  different  discrete  distributions,  it  can  be  a  little  contusing  to  know  when  each  distribution  is 
applicable.  To  help  mitigate  this  confusion,  let  us  illustrate  the  difference  between  the  binomial,  geometric,  negative 
binomial,  and  Pascal  distributions.  Consider,  then,  the  following  sequence  of  trials,  where  “1”  signifies  a  success 
and  “0”  a  failure. 


TVial:  1  2  3  4  5  6  7  8 

Outcome:  10  1110  0  1 

The  binomial  ( n ,  p)  represents  the  number  of  successes  in  n  trials  so  it  would  evaluate  as  follows. 

binomial (  1  ,  p  )  =  1 

binomial (  2  ,  p  )  -  1 

binomial (  3  ,  p  )  =  2 

binomial (  4  ,  p  )  =  3 

binomial (  5  ,  p  )  =  4 

binomial (  6  ,  p  )  =  4 

binomial (  7  ,  p  )  -  4 

binomial (  8  ,  p  )  -  5 

The  geometric  ( p )  represents  the  number  of  failures  before  the  first  success.  Since  we  have  a  success  on  the  first 
trial,  it  evaluates  as  follows. 

geometric (  p  )  =  0 

The  negativeBinomial  (s,  p)  represents  the  number  of  failures  before  the  5th  success  in  n  trials  so  it  would  evaluate 
as  follows. 


negativeBinomial (  1  ,  p  )  =  0 

negativeBinomial (  2  ,  p  )  =  1 

negativeBinomial (  3  ,  p  )  =  1 

negativeBinomial (  4  ,  p  )  =  1 

negativeBinomial (  5  ,  p  )  =  3 

The  pascal  (5,  p)  represents  the  number  of  trials  in  order  to  achieve  5  successes  so  it  would  evaluate  as  follows. 


pascal ( 

1  ; 

-  P 

) 

=  1 

pascal ( 

2  , 

-  P 

) 

=  3 

pascal ( 

3  , 

-  P 

) 

=  4 

pascal ( 

4  , 

-  P 

) 

-  5 

pascal ( 

5  , 

-  P 

) 

-  8 
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6.2  Adding  New  Distributions 

We  show  here  how  it  is  possible  to  extend  the  list  of  distributions.  Suppose  that  we  want  to  generate  random  num¬ 
bers  according  to  the  probability  density  function  shown  in  Figure  85. 


Figure  85.  Semi-Elliptical  Density  Function. 

The  figure  is  that  of  a  semi-ellipse,  and  its  equation  is 

f(x)  =  —  V 1  —  jc2  ,  where  - 1  <  x  <  1.  (52) 

7C 

Integrating,  we  find  that  the  cumulative  distribution  function  is 


.  1  W 1  -  x2  +  sin  *(■*) 

FW„-+ - - - 


,  where 


-1  <*<  1. 


(53) 


Now,  this  expression  involves  trancendental  functions  in  a  nonalgebraic  way,  which  precludes  inverting.  But,  we  can 
still  use  the  acceptance-rejection  method  to  turn  this  into  a  random  number  generator.  We  have  to  do  two  things. 

(1)  Define  a  function  that  returns  a  value  for  y,  given  a  value  for  x. 

(2)  Define  a  circular  distribution  that  passes  the  function  pointer  to  the  User-Specified  distribution. 

Here  is  the  resulting  source  code  in  a  form  suitable  for  inclusion  in  the  Random  class. 


double  ellipse (  double  x,  double,  double  )  //  Ellipse  Function 

return  sqrt(  1.  -  x  *  x  )  /  M_PI_2; 

} 

double  Random: : elliptical (  void  )  //  Elliptical  Distribution 

{ 

const  double  X_MIN  -  -1.; 
const  double  X_MAX  «  1 . ; 
const  double  Y_MIN  -  0 . ; 
const  double  Y_MAX  -  1.  /  M__PI_2? 

return  user Specif ied(  ellipse,  X_MIN,  X_MAX,  Y_MIN,  Y_MAX  )? 

} 


And  here  is  source  code  to  make  use  of  this  distribution. 


#include  <iostream.h> 

# include  "Random. h" 

void  main(  void  ) 

{ 

Random  rv; 

for  (  int  i  -  0;  i  <  1000;  i++  )  cout  <<  rv . elliptical ( )  <<  endl; 
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6.3  Bootstrap  Method  as  an  Application  of  Sampling 

If  we  are  only  interested  in  the  mean  value,  x,  of  a  set  of  data  and  wish  to  know  the  accuracy  of  the  sample  mean, 
then  there  is  a  handy  formula  available: 


Standard  Error  of  the  Mean  = 


“1 1/2 


(54) 


On  the  other  hand,  if  we  are  interested  in  some  other  metric,  such  as  the  correlation  coefficient,  then  there  is  no  sim¬ 
ple  analytical  formula  that  allows  us  to  estimate  the  error.  The  bootstrap  method  was  designed  to  address  this  situa¬ 
tion.  The  basic  idea  is  that  we  sample  the  original  data  with  replacement  to  obtain  a  synthetic  data  set  of  another  N 
data  points,  and,  from  this,  we  compute  the  statistic  we  are  interested  in.  We  then  repeat  this  process  over  and  over 
until  we  have  built  up  a  set  M  of  computed  values  of  the  relevant  statistic.  We  then  compute  the  standard  deviation 
of  these  M  values;  it  will  provide  the  standard  error  of  the  statistic.  Given  the  high  cost  and  consequent  scarcity  of 
data  in  many  applications,  combined  with  the  reduced  cost  and  abundance  of  computing  power,  the  bootstrap 
method  becomes  a  very  attractive  technique  for  extracting  information  from  empirical  data  (Diaconis  and  Efron 
1983;  Efron  and  Tibshirani  1991).  The  New  York  Times  had  this  to  say: 

A  new  technique  that  involves  powerful  computer  calculations  is  greatly  enhancing  the  statistical 
analysis  of  problems  in  virtually  all  fields  of  science.  The  method,  which  is  now  surging  into 
practical  use  after  a  decade  of  refinement,  allows  statisticians  to  determine  more  accurately  the 
reliability  of  data  analysis  in  subjects  ranging  from  politics  to  medicine  to  particle  physics.... 

(Nov.  8,  1988,  Cl,  C6). 

Here,  we  give  an  example  of  how  sampling  may  be  applied  to  empirical  data  in  order  to  compute  a  bootstrap  error 
estimate.  The  data  consist  of  150  spall  fragments  that  were  collected  when  a  penetrator  perforated  a  plate  of  armor. 
Each  spall  fragment  was  weighed  and  the  dimensionless  shape  factor  (see  section  3.7  for  the  definition)  was  mea¬ 
sured  from  16  different  directions  in  order  to  compute  an  average  shape  factor.  Thus,  the  experimental  data  consist 
of  150  mass,  average  shape  factor  pairs.  The  question  arises  as  to  whether  there  is  any  correlation  between  mass  and 
average  shape  factor.  For  example,  one  might  expect  small  fragments  to  be  more  compact  and  large  fragments  to  be 
more  irregular  in  shape.  This  would  be  reflected  in  a  positive  correlation  coefficient.  The  correlation  coefficient 
computed  from  the  original  experimental  data  is  -0. 132874.  Since  the  absolute  value  is  considerably  smaller  than 
1 ,  there  appears  to  be  no  correlation  between  mass  of  the  fragment  and  its  average  shape  factor.*  Now,  we  would  like 
to  know  how  much  variation  to  expect  in  the  correlation  coefficient.  The  150  data  pairs  were  put  into  a  file  called 
“sampleData.”  The  following  source  code  then  implements  the  bootstrap  method. 


tinclude  <iostream.h> 
#include  "Random. h" 


void  main(  void  ) 

{ 

const  int  NJDATA  =  150;  //  number  of  data  points 

const  int  N_DIMS  =2;  //  number  of  dimensions 

Random  rv; 


double  data [  N_DIMS  ] ; 
for  (  int  i  =  0;  i  <  N_DATA ;  i++  )  { 
rv. sample (  data,  N_DIMS  ); 

cout  «  data [  0  ]  «  "  "  «  data [  1  ]  «  endl ; 

} 

} 


This  will  generate  a  synthetic  set  of  150  data  pairs,  from  which  we  compute  the  corresponding  correlation  coeffi¬ 
cient.  We  then  replicate  the  process  128,  256,  512,  and  1,024  times.  After  128  replications,  we  find  the  following 
statistics  on  the  state  of  correlation  among  the  two  variables,  shape  factor  and  mass. 


More  formally,  the  value  of  the  t  statistic  is  -1 . 63094,  and  since  I  -  1. 63094 1  <  1 . 96,  the  critical  value  for  a  two-sided  test  at  a  0.05  signifi¬ 
cance  level,  it  fails  the  Student  t  test.  Consequently,  we  cannot  reject  the  null  hypothesis  that  the  data  pairs  are  uncorreiated. 
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n 

= 

128 

min 

-0.251699 

max 

= 

-0.0775172 

sum 

= 

-17.8033 

ss 

= 

2.60043 

mean 

5S2 

-0.139088 

var 

= 

0.000978001 

sd 

= 

0.031273 

se 

= 

0.00276417 

skew 

- 

-0.83868 

kurt 

= 

4.44186 

The  statistics  after  256  replications  are  as  follows. 


n 

= 

256 

min 

= 

-0.247179 

max 

= 

0.00560571 

sum 

= 

-36.5577 

ss 

5.51328 

mean 

= 

-0.142803 

var 

= 

0.00114789 

sd 

= 

0.0338805 

se 

0.00211753 

skew 

= 

0.254268 

kurt 

= 

4.59266 

The  statistics  after  512  replications  are  as  follows. 


n 

as 

512 

min 

= 

-0.247179 

max 

= 

0.00560571 

sum 

ss 

-72.0359 

ss 

= 

10.7341 

mean 

= 

-0.140695 

var 

= 

0.00117227 

sd 

= 

0.0342384 

se 

= 

0.00151314 

skew 

= 

0.161558 

kurt 

3.91064 

The  statistics  after  1,024  replications  are  as  follows. 


n 

= 

1024 

min 

= 

-0.280715 

max 

= 

0.00560571 

sum 

= 

-142.313 

ss 

= 

21.0328 

mean 

= 

-0.138978 

var 

= 

0.00122616 

sd 

as 

0.0350165 

se 

= 

0.00109427 

skew 

= 

0.118935 

kurt 

= 

4.05959 

Thus,  we  may  say  (with  i-a  confidence)  that  the  correlation  coeffient  is  -0. 139  ±  0. 035  and  conclude  that  the  data 
are  uncorrelated. 

Performing  a  bootstrap  on  the  t  statistic  gives  the  following  results. 


N 


value  of  t 


statistic 


128 

256 

512 

1024 


-1.72921 

-1.70181 

-1.70739 

-1.68787 


±  0.401098 
±  0.407198 
±  0.445316 
±  0.428666 
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6.4  Monte  Carlo  Sampling  to  Evaluate  an  Integral 

A  simple  application  of  random  number  distributions  is  in  the  evaluation  of  integrals.  Integration  of  a  function  of  a 
single  variable,  /(*),  is  equivalent  to  evaluating  the  area  that  lies  below  the  function.  Thus,  a  simple  way  to  esti¬ 
mate  the  integral 

b 

J  /(*)  dx  (55) 

a 

is  to  first  find  the  bounding  rectangle  [a,  b ]  x  [0,  ym aJ,  as  shown  in  Figure  86;  select  uniform  random  points  (X,y) 
within  the  rectangle;  and,  for  every  point  that  satisfies  the  condition  Y  <  f(X)y  increment  the  area  estimate  by  the 
amount  ( b  -  a)ymax/vV,  where  N  is  the  total  number  of  sample  points  in  the  bounding  rectangle. 


y 


Figure  86.  Integration  as  an  Area  Evaluation  via  Acceptance-Rejection  Algorithm 
This  can  be  accomplished  with  the  following  source  code. 


#include  <iostream.h> 

# include  "Random. h" 

double  f(  double  x  )  {  return  } 

void  main(  void  ) 

{ 

Random  rv; 

const  double  A  =  ... 

const  double  B  =  ... 

const  double  Y_MAX  = 

const  int  N  »  ... 

double  area  =  0 . ; 
for  (  int  i  =  0;  i  <  N;  i++  )  { 
double  x  =  rv.uniform(  A,  B  ); 
double  y  -  rv. uniform (  0.,  Y_MAX  ); 
if  (  y  <  f(  x  )  )  area  +=  (  B  -  A  )  *  Y_MAX  /  N; 

} 

cout  «  "area  estimate  =  "  «  area  «  endl; 

} 


This  is  essentially  a  binomial  process  with  a  probability  of  success  p  equal  to  the  ratio  of  the  area  under  the  curve  to 
the  area  of  the  bounding  rectangle.  The  standard  deviation  of  the  area  estimate,  therefore  (see  section  5.2.2),  is 


a 


Va'pCI  -p) 

N 


x  area  of  bounding  rectangle. 


(56) 


Note  that  the  factor  Vp(1  -  p)  is  close  to  0.5  unless  we  happen  to  have  a  very  close-fitting  bounding  rectangle.  This 
so-called  “hit-or-miss”  method  is  also  inefficient  in  that  two  calls  are  made  to  the  random  number  generator  for 
each  sample  point.  A  more  efficient  method  is  obtained  by  first  approximating  the  integral  in  eq.  (55)  by  its  Rie- 
mann  sum: 
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(57) 


f  fix)  dx  -  £  f(Xi)  Ax,  =  (f>  -  a)  ^  £  /(*<)• 

*  j=l  **  i = 1 

a 

This  dispenses  with  the  bounding  rectangle  and  only  requires  uniform  random  points  along  the  x-axis.  Conse¬ 
quently,  the  source  code  can  be  simplified  to  the  following. 


#include  <iostream.h> 

# include  "Random. h" 

double  f(  double  x  )  {  return  •••  } 

void  main(  void  ) 

{ 

Random  rv; 

const  double  A  *  ••• 

const  double  B  *  ••• 

const  int  N  * 

double  sum  -  0 .  ; 

for  (  int  i  -  0;  i  <  N;  i++  )  sum  +=  f(  rv.uniform(  A,  B  )  ); 
cout  «  "area  estimate  *  "  «  (  B  -  A  )  *  sum  /  N  «  endl; 

} 

Notice  that  eq.  (57)  expresses  the  integral  as  (b  -  a)  x  mean  value  of  /.  Thus,  we  increase  accuracy  to  the  extent 
that  the  points  are  spread  uniformly  over  the  x-axis.  Maximal  avoidance  is  perfectly  suited  to  do  this  and  avoid  the 
clustering  of  points  that  we  get  with  the  uniform  random  generator.  This  can  be  accomplished  by  simply  replacing 

rv. uniform(  A,  B  )  =>  rv . avoidance ( )  *  (  B  -  A  ) 


in  the  above  source  code. 

To  illustrate  the  advantage  of  maximal  avoidance  over  uniform  random  in  a  simple  case,  let  us  consider  the  cosine 
density: 


/w=icos(V}  °-jc-1’ 

where  a  =  1/2  and  b  =  1/x.  The  integral  can  be  performed  analytically: 

F(x)  =  ]/(|)^=i  1+sinfc-^j 

o  L  v  - 

and  F(l)  =  1.  Table  5  shows  the  errors  with  the  two  methods. 


0<x<  1, 


(58) 


(59) 


Table  5.  Comparison  of  Uniform  Random  and  Maximal  Avoidance  in  Monte  Carlo  Sampling 


Number  of 
Sample  Points 

Uniform  Random 
Value  %  Error 

Maximal  Avoidance 
Value  %  Error 

100 

1,000 

10,000 

100,000 

1,000,000 

1.00928  +0.93 

0.993817  -0.6183 

1.00057  +0.057 

0.999771  -0.0229 

1.00026  +0.026 

1.01231  +1.23 

1.0005  +0.05 

1.00015  +0.015 

1.00001  +0.001 

1  <  10*5 

It  can  be  shown  (e.g.,  Press  et  al.  [1992]  pp.  309-314)  that  the  fractional  error  term  in  maximal  avoidance  decreases 
as  In  AW,  which  is  almost  as  fast  as  UN.  In  contrast,  the  fractional  error  term  in  uniform  random  sampling 
decreases  as  N~m,  the  same  as  in  the  hit-or-miss  Monte  Carlo  sampling  (cf.  eq.  [56]). 
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6.5  Application  of  Stochastic  interpolation 

The  technique  of  stochastic  interpolation  is  very  useful  in  those  cases  where  the  data  does  not  seem  to  fit  any  known 
distribution.  It  allows  us  to  simulate  the  essential  characteristics  of  the  data  without  returning  the  same  data  points 
over  and  over  again.  For  example,  Figure  87  shows  bifurcated  data  in  the  x-y  plane.  Without  an  understanding  of 
the  underlying  mechanism,  it  would  be  very  difficult  to  fit  a  distribution  to  this  data.  However,  it  is  easy  to  create 
synthetic  realizations  using  stochastic  interpolation.  We  must  first  place  the  data  in  a  file  named  “stochasticData.” 
Then  the  following  code  will  produce  another  realization  such  as  that  shown  in  Figure  88. 


# include  ciostream . h> 

#include  <stdlib.h> 
tinclude  <unistd.h> 

#include  "Random. h" 

void  main(  int  argc,  char*  argv[]  ) 

{ 

long  seed  =  long(  getpid( )  ); 

if  (  argc  2  )  seed  *  atoi(  argv[  1  ]  ); 

Random  rv(  seed  ) ; 

for  (  int  i  =  0;  i  <  N_DATA;  i++  )  { 

point  p  *  rv. stochasticlnterpolation( ) ; 
cout  «  p.x  «  "  "  «  p.y  «  endl; 

} 

} 
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Figure  87.  Stochastic  Data  for  Stochastic  Interpolation.  Figure  88.  Synthetic  Data  via  Stochastic  Interpolation. 
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6.6  Combining  Maximal  Avoidance  With  Distributions 

It  is  also  possible  to  combine  techniques.  For  example,  we  could  use  maximal  avoidance  to  generate  points  in  space 
and  then  perform  a  transformation  to  get  the  density  of  points  appropriate  for  a  desired  distribution.  This  amounts  to 
using  maximal  avoidance  rather  than  uniform  random  to  generate  the  points  for  transformation.  However,  since 
maximal  avoidance  is  deterministic  rather  than  pseudo-random,  the  price  we  pay  is  that  the  pattern  generated  will 
always  be  the  same.  Figure  89  is  an  example  to  generate  bivariate  normal. 


Figure  89.  Combining  Maximal  Avoidance  With  Bivariate  Normal. 

The  source  code  is  as  follows. 


tinclude  <iostream. h> 
# include  "Random. h" 


void  main(  void  ) 

{ 

Random  rv; 

const  int  N  =  2000;  //  number  of  points 


const  double  MU  -  0 . ; 
const  double  SIGMA  -  1 . ; 
const  double  X_MIN  =  -1.; 
const  double  X_MAX  =  1 . ; 
const  double  Y_MIN  =  -1.; 
const  double  Y_MAX  =  1 . ; 


double  data [  2  ] ; 

for  (  int  i  =  0;  i  <  N;  i++  )  { 


rv. avoidance (  data,  2  ); 

double  x  =  X_MIN  +  {  X_MAX  -  X_MIN  ) 
double  y  =  Y_MIN  +  (  Y_MAX  -  Y_MIN  ) 
double  p  =  x*x  +  y*y; 
if  (  p  <  1.  )  { 

cout  «  MU  +  SIGMA  *  X  *  sqrt (  -2.  *  log(  p  )  /  p  )  « 

«  MU  +  SIGMA  *  y  *  sqrt (  -2.  *  log(  p  )  /  P  )  «  endl; 

} 

} 

} 


*  data[  0  ]; 

*  data [  1  ] ; 
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6.7  Application  of  Tausworthe  Random  Bit  Vector 

Various  systems  in  a  combat  vehicle  are  composed  of  critical  components  that  are  functionally  related  through  the 
use  of  one  or  mor e  fault  trees .  For  instance.  Figure  90  shows  the  fault  tree  for  the  main  gun  of  the  M1A1  tank 
(Ploskonka  et  al.  1988). 


Figure  90.  Fault  Tree  for  Main  Armament  of  Ml  Al  Tank. 

Each  of  the  18  components  comprising  this  diagram  is  considered  critical  because  its  dysfunction  may  have  an 
adverse  affect  upon  the  gun  functioning.  However,  as  long  as  there  is  at  least  one  continuous  path  of  functioning 
components  from  the  top  node  to  the  bottom  node,  the  main  gun  will  still  function.  It  is  clear,  for  example,  that  the 
fault  tree  as  a  whole  is  more  sensitive  to  the  loss  of  component  1  than  it  is  to  the  loss  of  component  8.  There  are 
other  cases  where  it  is  not  so  clear.  Here,  we  show  how  the  random  bit  vector  can  be  used  to  rank  the  sensitivity  of 
the  components  based  upon  the  functioning  of  this  fault  tree.  We  need  a  bit  vector  of  length  n  =  18  and,  in  order  to 
generate  all  the  possible  combinations  of  states,  we  need  to  take  218  -  1  =  262, 143  samples,  the  cycle  length.  We 
are  guaranteed  that  each  combination  will  occur  once  and  only  once  in  the  cycle  (although  in  random  order).  The 
following  code  will  print  out  the  state  of  each  of  the  18  components  along  with  the  state  of  the  fault  tree. 
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♦include  <iostream.h> 

♦include  <stdlib.h> 

♦include  <unistd.h> 

♦include  "Random. h" 

void  main(  void  ) 

{ 

const  unsigned  LEN  -  18; 

const  int  N  *  int(  pow(  2,  LEN  )  -  1  ); 

unsigned  seed  >=  123456789; 

bool  c [  LEN  ] ; 

Random  rv; 

for  (  int  n  *  0;  n  <  N;  n++  )  { 

rv . tausworthe (  seed,  LEN,  c  );  //  assign  a  state  to  each  component 

for  (  int  i  -  0;  i  <  LEN;  i++  )  cout  «  c[  i  ]  «  " 

c[  0  ]  |-  c[  X  ]  |  c[  2  ]  |  c[  3  ]  I  c[  4  ]  I  c[  5  ]; 

c [  8  ]  6-  c[  9  ]; 

c[  8  ]  I-  c [  10  ]; 

c[  7  ]  6-  C[  8  ]; 

c t  6  ]  I-  c[  7  ]  |  c[  11  ]  +  c[  12  ]; 

c[  13  ]  |=  c [  14  ]  ; 
c[  6  ]  6-  C[  13  ]; 

c[  0  ]  |-  c[  6  ]  |  c[  15  ]  I  c[  16  ]  I  c[  17  ]; 

cout  «  c[  0  ]  «  endl; 

} 

} 

Next,  we  determine  the  correlation  coefficient  between  each  of  the  components  and  the  overall  state  of  the  fault  tree. 
The  results  are  shown  in  Table  6. 


//  number  of  components 
//  number  of  combinations 
//  seed  for  tausworthe  generator 
//  boolean  component  array 


Table  6.  Correlation  Coefficient  Between  Component  and  Fault  Tree  Deactivation 


Component 

Correlation  Coefficient 

i 

0.024713 

2 

0.024713 

3 

0.024713 

4 

0.024713 

5 

0.024713 

6 

0.024713 

7 

0.00494267 

8 

0.00216247 

9 

0.000309005 

10 

0.000309005 

11 

0.00123574 

12 

0.00494267 

13 

0.00494267 

14 

0.0179169 

15 

0.0179169 

16 

0.024713 

17 

0.024713 

18 

0.024713 

It  is  apparent  that  the  components  fall  into  groups  based  upon  their  significance  to  the  overall  fault  tree.  Sorting  the 
components  from  most  significant  to  least  significant  gives  the  results  shown  in  Table  7. 
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Table  7.  Ranking  of  Component  Significance  to  Fault  Tree  Deactivation 


Group 

Components 

i 

1,2, 3, 4, 5, 6,16, 17,  and  18 

KI 

Hand  15 

Bi 

7, 12,  and  13 

K9 

8 

Kfl 

11 

mM 

9  and  10 

This  shows  that  components  fall  into  six  distinct  classes  based  upon  their  significance  to  the  overall  fault  tree.  It  also 
gives  a  certain  degree  of  verification  that  the  coding  is  correct  since  it  tell  us  that  all  the  components  in  a  given  group 
occur  in  the  fault  tree  with  the  same  footing.  It  is  clear  by  examing  Figure  90  that  components  7, 12,  and  13,  for 
instance,  all  have  the  same  significance  to  the  vulnerability  of  the  main  gun. 

Now,  for  the  case  shown  here,  where  the  number  of  components  is  18,  we  could  easily  write  a  program  consisting  of 
a  series  of  nested  loops  to  enumerate  all  218  - 1  nonzero  states.  However,  we  do  not  have  to  add  very  many  more 
components  before  this  direct  enumeation  approach  will  not  be  feasible.  For  instance,  30  components  results  in 
more  than  one  billion  possible  states.  The  Tausworthe  random  bit  generator  provides  an  alternative  approach.  We 
could  replicate,  say  100,000  times,  knowing  that  the  bit  vectors  will  be  unique  as  well  as  “random.”  As  such,  it  pro¬ 
vides  a  useful  tool  for  the  examination  and  verification  of  fault  trees. 
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APPENDIX  A: 

UNIFORM  RANDOM  NUMBER  GENERATOR 

The  underlying  random  number  generator  used  is  one  that  will  generate  uniformly  distributed  random  numbers  on 
the  half-open  interval  [0, 1).  Any  other  distribution  of  random  numbers,  with  few  exceptions,  results  from  mathe¬ 
matical  transformation.  Thus,  it  is  clear  that  we  need  a  good  generator  of  random  numbers  U(0, 1).  Here  we  discuss 
some  criteria  of  what  constitutes  a  good  generator.  After  this,  we  discuss  some  tests  that  can  be  applied  and  show 
the  test  results  when  applied  to  several  candidate  generators. 

A.1  Selection  Criteria 

We  will  consider  four  attributes  of  what  constitutes  a  good  random  number  generator. 

•  Range 

A  good  generator  should  have  the  capability  of  producing  a  large  range  of  random  numbers. 

•  Speed 

A  good  generator  should  be  fast. 

•  Portability 

The  generator  should  give  the  same  sequence  of  random  numbers,  regardless  of  what  computer  it  is  run  on. 
As  a  minimum,  we  need  the  explicit  source  code. 

•  Validation 

A  good  generator  should  exhibit  apparent  randomness  by  passing  certain  well-accepted  validation  tests. 

Six  candidate  generators  were  examined  and  tested  for  consideration  as  the  underlying  generator  for  U (0, 1).  They 
are  all  linear  conguential  generators  of  the  form 

Xi+1  =  (aX,  +  c)  (modm).  (A-l) 

This  is  a  recurrence  relation  for  generating  the  next  number  in  the  sequence,  given  the  multiplier  a,  the  increment  c, 
and  the  modulus  m.  The  linear  conguential  method  is  a  well-accepted  technique  for  generating  a  sequence  of  num¬ 
bers  that  take  on  the  appearance  of  randomness.  They  possess  the  following  properties.1,2 

•  They  are  relatively  fast,  requiring  few  arithmetic  operations. 

•  They  produce  a  range  of  values  no  greater  than  the  modulus  m. 

•  They  have  a  period  or  cycle  length  no  greater  than  m. 

•  However,  they  are  not  free  of  sequential  correlations. 

We  consider  two  generators  available  in  standard  C  libraries,  rand  and  drand48 — and  thus,  commonly  used — and 
four  versions  of  a  generator  proposed  by  Park  and  Miller.2,2  First,  we  give  a  short  description  of  each  one  and  list  the 
source  code. 

rand 

This  is  the  ANSI  C  version  of  a  random  number  generator  and  has  been  implemented  in  C  as  follows: 

unsigned  long  next  =  1; 

void  srand(  unsigned  int  seed  )  /*  set  the  seed  */ 

{ 

next  =  seed; 

} 

int  rand(  void  )  /*  return  a  pseudo -random  number  */ 

{ 

next  =  next  *  1103515245  +  12345; 

return  (  unsigned  int  )(  next  /  65536  )  %  32768; 

} 


Knuth,  D.  E.  The  Art  of  Computer  Programming,  Volume  2:  Seminumerical  Algorithms.  London:  Addison-Wesley,  1969. 

Press,  W.  H.,  S.  A.  Teukolsky,  W.  T.  Vetterling,  and  B.  P.  Flannery.  Numerical  Recipes  in  C:  The  Art  of  Scientific  Computing.  New  York: 
Cambridge  University  Press,  Second  Edition,  1992. 

Park,  S.  K.,  and  K.  W.  Miller.  Communications  of  the  ACM.  Vol.  31 ,  pp.  1 192-1201, 1988. 
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Now,  it  may  appear  that  the  modulus  here  is  32,768,  but,  in  fact,  it  is  232,  due  to  the  fact  that  “unsigned  long”  is 
32  bits  in  length  and  so  modulus  232  is  implicit.  However,  due  to  the  fact  that  the  returned  number  is  “%  32768”  or 
“modulo  32768”  means  that  it  is  only  capable  of  producing,  at  most,  32,768  distinct  numbers.  Thus,  it  has  a  period 
of  232  and  a  range  no  greater  than  32,768. 

drand4  8 

This  also  is  in  the  standard  C  library.  This  generator  uses  three  16-bit  integer  words  in  order  to  perform  48-bit  arith¬ 
metic.  The  constants  are  a  =  25, 214, 903, 917,  c  =  1 1,  and  m  =  248,  so  that  it  thas  a  very  long  period. 


//  dxand4  8 . C : 
// 

// 

// 

// 

// 

// 

// 

// 

// 

// 


A  portable  implementation  of  drand48,  this  routine  will 
generate  precisely  the  same  stream  of  pseudo- random  numbers 
in  the  interval  [0,1)  as  drand48r  for  the  same  seed  value. 
The  drand48  algorithm  is  based  on  the  linear  congruential 
x  [  n  +  1  ]  -  (  a  *  x  [  n  ]  +  c  )  (  mod  m  ) , 
where 

a  -  25214903917  ( 0x5DEECE66D) , 
c  -  11  ( OxB ) ,  and 

m  -  2~48  -  281474976710656  (0x1000000000000), 
while  using  4 8 -bit  integer  arithmetic,  but  ignoring 
multiplication  and  addition  overflows  of  two  16 -bit  integers. 


static  const  unsigned  int  N_BITS  *  16; 

static  const  double  TWO_16  -  1.  /  (  1L  <<  N_BITS  ); 


static  const  unsigned  int  MASK 


unsigned (  1  <<  (  N_BITS  -  1  )  )  + 

unsigned(  1  <<  (  N_BITS  -  1  )  )  -  1;  //  65535 


static 

static 

static 

static 

static 

static 

static 


const  unsigned  int  X0 
const  unsigned  int  XI 
const  unsigned  int  X2 
const  unsigned  int  A0 
const  unsigned  int  A1 
const  unsigned  int  A2 
const  unsigned  int  C 


static  unsigned  int  x[  3  ]  « 
static  unsigned  int  a[  3  ]  * 
static  unsigned  int  c  -  C; 


-  0x330E; 

-  OxABCD; 

-  0x1234; 
=  0XE66D; 

-  OxDEEC; 

-  0x5; 

-  OxB; 


//  13070 
//  43981 
//  4660 
//  58989 
//  57068 
//  5 
//  11 


{  X0,  XI,  X2  ); 
{  A0,  Al,  A2  }; 


static  void  next(  void  ) ; 


void  my_srand48(  long  seed  ) 

C 

x[  0  ]  -  XO; 

x[  1  ]  -  unsigned (  seed  )  6  MASK;  //  store  low-order  bits 

x[  2  ]  -  (  unsigned (  seed  )  >>  N_BITS  )  &  MASK;  //  store  high-order  bits 

a[  0  ]  -  A0; 

a[  1  ]  “  Al; 

a[  2  ]  =  A2; 

c  -  C; 


double  drand48(  void  ) 

C 

next(); 

return  TWO_16  *  (  TWO_16  *  (  TWO_16  *  x[  0  ]  +  x[  1  ]  )  +  x[  2  ]  ); 


static  void  next(  void  ) 

{ 

unsigned  p[2],q[2),r[2]; 
bool  carry 0,  carryl,  carry2; 

long  prod; 

prod  -  long (  a [  0  ]  )  *  long (  x [  0  ]  ) ; 

p[  0  ]  «  unsigned (  prod  )  &  MASK; 

p[  1  ]  «  unsigned (  prod  >>  N_BITS  )  &  MASK; 

carryO  -  long(  p[  0  ]  )  +  long(  c  )  >  MASK; 

carryl  «  long(  p[  1  ]  )  +  long(  carryO  )  >  MASK; 

p[  0  ]  -  unsigned(  p[  0  ]  +  c  )  &  MASK; 

p[  1  ]  -  unsigned(  p[  1  ]  +  carryO  )  &  MASK; 

prod  -  long(  a[  0  ]  )  *  long(  x[  1  ]  ); 

q[  0  ]  -  unsigned (  prod  )  &  MASK; 

q[  1  ]  -  unsigned (  prod  >>  N_BITS  )  &  MASK; 
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carry 0  =  long(  p[  1  ]  )  +  long(  q[  0  ]  )  >  MASK? 

p[  1  ]  =  unsigned(  p[  1  ]  +  q[  0  ]  )  &  MASK; 

prod  =  long<  a[  1  ]  )  *  long(  x[  0  ]  )? 

r[  0  ]  =  unsigned(  prod  )  &  MASK; 

r[  1  3  =  unsigned (  prod  >>  N_BITS  )  &  MASK? 


carry 2  =  long(  p[  1  ]  )  +  long(  r[  0  ]  )  >  MASK; 
x[  2  ]  =  unsigned (  carryO  +  carryl  +  carry2  +  q[  1  ]  +  r[  1  ]  + 

a[  0  ]  *  x[  2  ]  +  a[  1  ]  *  x[  1  ]  + 

a[  2  ]  *  x[  0  ]  )  &  MASK? 

x[  1  ]  =  unsigned(  p  [  1  3  +  *[  0  3  )  &  MASK? 

x[  0  3  =  unsigned(  p  [  0  ]  )  &  MASK? 


ranO 

This  is  the  “minimal”  random  number  generator  of  Park  and  Miller.3  The  constants  are  a  =  16,807,  c  =  0,  and 
m  -  231  - 1  (a  Mersenne  prime).  It  uses  Schrage’s  method4  to  implement  the  recurrence  formula  without  overflow 
of  a  32-bit  word.  It  has  a  period  of  231  —  2  =  2, 147, 483, 646. 


//  ranO.C:  Minimal  random  number  generator  of  Park  and  Miller. 

//  Returns  a  uniform  random  deviate  in- [0,1)  with  a  period  of  2" 31-2. 

//  Set  or  reset  seed  to  any  integer  value  (except  the  value  0)  to 

//  initialize  the  sequence;  seed  must  not  be  altered  between  calls  for 

//  successive  deviates  in  the  sequence. 

//  Ref:  Press,  W.  H.,  et  al.,  "Numerical  Recipes  in  C",  Cambridge,  1992,  p.  278 
#include  <assert.h> 


double  ran0(  longs  seed  ) 

{ 

static  const  long  M  « 

static  const  long  A  = 

static  const  long  Q  = 

static  const  long  R  = 

static  const  double  F  = 


2147483647? 

16807; 

127773; 

2836; 

1.  /  M; 


//  Mersenne  prime  2~31-1 

//  7~  5  is  a  primitive  root  of  M 


assert (  seed  !-  0  ); 
long  k  =  seed  /  Q; 

seed  -  (  seed  -  k  *  Q  )  *A-k*R; 
if  (  seed  <  0  )  seed  +=  M? 

return  seed  *  F; 

} 


//  since  it  wonft  work  if  seed  =  0 

//  compute  seed  =  (  A  *  seed  )  %  M 
//  without  overflow 
//  by  Schragefs  method 

//  convert  to  a  floating  point 


rani 

This  is  the  same  as  ranO,  with  the  same  constants,  but  also  makes  use  of  Bays-Durham  shuffle5  to  break  up  sequen¬ 
tial  correlations  inherent  in  the  linear  congruential  method. 


//  ranl.C:  Random  number  generator  of  Park  and  Miller  with  Bays-Durham  shuffle. 
//  Returns  a  uniform  random  deviate  in  [0,1)  with  a  period  of  2~31-2. 

//  Set  the  seed  to  any  integer  value  (except  zero)  to  initialize  the 

//  sequence;  seed  must  not  be  altered  between  calls  for  successive 

//  deviates  in  the  sequence. 

//  Ref:  Press,  W.  H.,  et  al.,  "Numerical  Recipes  in  C",  Cambridge,  1992,  p.  278 
#include  <assert.h> 


double  ranl(  longs  seed  ) 

{ 

static  const  long  M 
static  const  long  A 
static  const  long  Q 
static  const  long  R 
static  const  double  F 
static  const  short  NTAB 


2147483647?  //  Mersenne  prime  2" 3 1-1 

16807?  //  7" 5  is  a  primitive  root  of  M 

127773? 

2836; 

1.  /  M? 

32? 


Park,  S.  K.,  and  K.  W.  Miller.  Communications  of  the  ACM.  Vol.  31,  pp.  1 192-1201, 1988. 

Schrage,  L.  ACM  Transactions  on  Mathematical  Software .  Vol.  5,  pp.  132-138, 1979. 

Bays,  C.,  and  S.  D.  Durham.  “Improving  a  Poor  Random  Number  Generator.”  ACM  Transactions  on  Mathematical  Software ,  Vol.  2,  pp. 
59-64,  1976. 
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static  const  long  DIV  -  1  +  (  M  -  1  )  /  NTAB? 

assert(  seed  !-  0  );  //  since  it  won't  work  if  seed  -  0 

static  long  value  *  0; 
static  long  table [  NTAB  ] ; 

if  (  value  --  0  )  {  //  load  the  shuffle  table  the  first  time  through 

for  (  int  i  *  NTAB  +  7;  i  >=  0;  i--  )  {  //  first  perform  8  warm-ups 

long  k  -  seed  /  Q; 

seed  *  A  *  (  seed  -k*Q)  -  k  *  R? 
if  (  seed  <  0  )  seed  +«  M? 

if  (  i  <  NTAB  )  table [  i  ]  =  seed? 

} 

value  -  table [  0  ] ; 

} 

long  k  *  seed  /  Q?  //  compute  seed  -  (  A  *  seed  )  %  M 

seed  *  A  *  (  seed  -k*Q)-k*R?  //  without  overflow 

if  (  seed  <  0  )  seed  +*  M?  //  by  Schrage's  method 

int  i  *  value  /  DIV?  //  Bays -Durham  shuffle  algorithm 

value  -  table [  i  ] ? 
tablet  i  ]  -  seed; 

return  value  *  F;  //  return  a  floating  point 


ranlv2 

This  is  version  2  of  rani.  It  uses  the  multiplier,  a  =  48, 271. 


//  ranlv2.C:  Random  number  generator  of  Park  and  Miller  (version  2)  with 
//  Bays-Durham  shuffle  algorithm. 

//  Returns  a  uniform  random  deviate  in  [0,1)  with  a  period  of  2^31-2. 

//  Set  the  seed  to  any  integer  value  (except  zero)  to  initialize  the 

//  sequence;  seed  must  not  be  altered  between  calls  for  successive 

//  deviates  in  the  sequence. 

//  Ref:  Press,  W.  H.,  et  al.,  "Numerical  Recipes  in  C",  Cambridge,  1992,  p.  278 

#include  <assert.h> 

double  ranlv2(  longs  seed  ) 

C 

static  const  long  M  =  2147483647?  //  Mersenne  prime  2^31-1 

static  const  long  A  -48271?  //  this  is  a  prime  number 

static  const  long  Q  -  44488? 

static  const  long  R  -  3399; 

static  const  double  F  =*  1.  /  M? 

static  const  short  NTAB  -  32? 

static  const  long  DIV  =  1  +  (  M  -  1  )  /  NTAB? 

assert(  seed  !-  0  )?  //  since  it  won't  work  if  seed  -  0 

static  long  value  -  0; 
static  long  table  [  NTAB  ]  ? 

if  (  value  —  0  )  {  //  load  the  shuffle  table  the  first  time  through 

for  (  int  i  -  NTAB  +  7 ;  i  >=  0;  i--  )  {  //  first  perform  8  warm-ups 

long  k  -  seed  /  Q; 

seed  *  A  *  (  seed  -  k  *  Q  )  -k*R; 
if  (  seed  <  0  )  seed  +=  M; 

if  (  i  <  NTAB  )  table[  i  3  =  seed? 

} 

value  -  table  [  0  ] ; 

} 

long  k  -  seed  /  Q?  //  compute  seed  -  (  A  *  seed  )  %  M 

seed  *  A  *  (  seed  -k*Q)-k*R?  //  without  overflow 

if  (  seed  <  0  )  seed  +»  M?  //  by  Schrage's  method 

int  i  *  value  /  DIV;  //  Bays-Durham  shuffle  algorithm 

value  -  table [  i  ]  ; 
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} 


tablet  i  ]  -  seed; 
return  value  *  F; 


//  return  a  floating  point 


ranlv3 

This  is  version  3  of  rani.  It  uses  the  multiplier  a  =  69, 621. 


//  ranlv3.C:  Minimal  random  number  generator  of  Park  and  Miller  (version  3  ). 

//  Returns  a  uniform  random  deviate  in  [0,1)  with  a  period  of  2~31-2. 

//  Set  the  seed  to  any  integer  value  (except  zero)  to  initialize  the 

//  sequence;  seed  must  not  be  altered  between  calls  for  successive 

//  deviates  in  the  sequence. 

//  Refs  Press,  W.  H.,  et  al.,  "Numerical  Recipes  in  C",  Cambridge,  1992,  p.  278 

#include  <assert.h> 

double  ranlv3(  longs  seed  ) 

*  static  const  long  M  -  2147483647;  //  Mersenne  prime  2~31-1 

static  const  long  A  *  69621; 

static  const  long  Q  ■  30845; 

static  const  long  R  ■  23902; 

static  const  double  F  *  1.  /  M; 

static  const  short  NTAB  -  32; 

static  const  long  DXV  *  1  +  (  M  -  1  )  /  NTAB; 

assert(  seed  I-  0  );  //  since  it  won't  work  if  seed  -  0 

static  long  value  *  0; 
static  long  table [  NTAB  ] ; 

if  (  value  ■■  0  )  {  //  load  the  shuffle  table  the  first  time  through 

for  (  int  i  -  NTAB  +  7;  i  >*  0;  i--  )  {  //  first  perform  8  warm-ups 

long  k  -  seed  /  Q; 

seed  -  A  *  (  seed  -k*Q)-k*R; 
if  (  seed  <  0  )  seed  +«  M; 

if  (  i  <  NTAB  )  tablet  i  ]  *  seed; 

} 

value  -  table [  0  ] ; 

} 

long  k  -  seed  /  Q;  //  compute  seed  -  (  A  *  seed  )  %  M 

seed  -  A  *  (  seed  -k*Q)-k*R;  //  without  overflow 

if  (  seed  <  0  )  seed  +-  M;  //  by  Schrage's  method 

int  i  -  value  /  DIV;  //  Bays -Durham  shuffle  algorithm 

value  -  tablet  i  ) ; 
tablet  i  ]  “  seed; 

return  F  *  value;  //  return  a  floating  point 

} 

One  of  the  most  important  considerations  when  choosing  a  random  number  generator  is  how  many  distinct  random 
numbers  it  generates.  Let  N  be  the  number  of  random  deviates  requested,  and  let  n  be  the  actual  number  of  distinct 
random  deviates  generated.  Table  A-l  shows  the  results. 

Table  A-l.  Capability  to  Generate  Distinct  Random  Numbers 
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It  is  not  unreasonable  to  require  100,000  or  more  random  numbers  for  a  simulation,  so  these  results  alone  should  dis¬ 
qualify  rand  as  a  serious  random  number  generator. 

Next,  let  us  compare  the  speed  of  the  generators.  Table  A-2  shows  the  time  to  generate  one  million  random  deviates 
on  a  Silicon  Graphics  workstation  with  a  200-MHz  processor. 


Table  A-2.  Uniform  Random  Number  Generator  Timings 


Number 
Requested y  N 

Computer  Time  ( s ) 

rand  drand48  ranO  rani  ranlv2  ranlv3 

106 

0.37  1.17  0.88  1.21  1.21  1.20 

The  relative  timings  are  as  follows. 

rand:  1 

drand48:  3.2 
ranO:  2.4 

rani:  3.3 

ranlv2:  3.3 
ranlv3:  3.3 

Although  drand48  is  over  three  times  slower  than  rand,  it  is  a  much  better  generator.  Also,  as  is  apparent  from 
Table  A-2,  the  computer  time  involved  in  generating  uniform  deviates  is  not  likely  to  be  a  significant  burden  in  a 
simulation. 

Since  we  have  the  source  code  for  all  six  of  these  generators,  they  all  satisfy  the  portability  requirement. 

A.2  Validation  Tests 

Next,  we  subject  the  six  candidate  random  number  generators  to  four  tests  of  randomness. 


A.2.1  Chi-Square  Test 

The  chi-square  test  is  a  check  that  the  generated  random  numbers  are  distributed  uniformly  over  the  unit  interval.  In 
order  to  compute  a  value  of  x2  from  our  random  number  generators,  we  first  subdivide  the  interval  [0,1]  into  k 
subintervals  of  equal  length  and  then  count  the  number  nt  that  fall  into  the  ith  bin  when  a  total  of  n  random  numbers 
have  been  generated.  The  computed  value  of  x2  is  obtained  from  the  foimula 


where 


X2  =  ~  -  n/*)2> 

» 1=1 


(A-2) 


k  is  the  number  of  bins, 
n  is  the  total  number  of  random  deviates,  and 
ni  is  the  number  of  random  deviates  in  the  ith  bin. 

As  a  general  rule  when  carrying  out  this  test,  k  should  be  at  least  100  and  n/k  should  be  at  least  5.  As  the  number  of 
random  samples,  n,  was  increased,  we  increased  the  number  of  bins,  ky  such  that  the  ratio  n/k  remained  constant  at 
8.  The  critical  value  of  x 2  can  be  calculated  with  the  aid  of  the  formula,  valid  for  large  kf 

xl-u-d  =  (*  ■ -  1)  (l  -  9(/_  1}  +  -  l)]j  .  (A-3) 

where 

k  is  the  numbers  of  bins, 
a  is  the  significance  level,  and 

Z\-a  is  the  1  -  a  critical  point  of  the  standard  normal  distribution,  and  has  the  value  1.645  when  a  =  0. 05. 

The  results  are  displayed  in  Table  A-3.  (The  seed  used  for  all  of  these  tests  was  123456789.)  All  of  the  generators 
do  about  the  same  until  the  number  of  bins  exceeds  32,768.  Beyond  this  point,  rand  completely  fails  the  chi-square 
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test  This  is  a  manifestation  of  the  fact  that  rand  is  only  capable  of  generating,  at  most,  32,768  different  random 
numbers  (i.e.,  its  range).  Now,  a  good  generator  may  still  fail  the  test  for  a  particular  seed.  Indeed,  by  definition,  a 
perfectly  good  generator  should  fail  the  test  5 %  of  the  time.  So,  the  fact  that  the  other  generators  occasionally  go 
slightly  above  the  critical  value  is  probably  not  significant.  The  fact  that  both  ranlv2  and  ran2v3  never  fail  the  test 
may  be  significant  but  would  have  to  be  tested  further. 


Table  A-3.  Chi-Square  Test  Results  (at  a  0.05  Level  of  Significance) 


Number  of 
Samples,  n 


Number  of 
Bins,  k 


Critical 

Value 


Computed  Value  of  x2 
rand  drand48  ranO  rani 


ranlv2  ranlv3 


1024 

2048 

4096 

8192 

16384 

32768 

65536 

131072 

262144 

524288 

1048576 


128 

256 

512 

1024 

2048 

4096 

8192 

16384 

32768 

65536 

131072 


154 

293 

565 

1099 

2153 

4245 

8403 

16682 

33189 

66132 

131914 


137 

264 

506 

912 

2064 

4199 

8246 

16310 

32737 

588888 

3275060 


125 

264 

529 

1031 

2077 

4248 

8235 

16634 

32960 

65577 

130942 


149 

268 

538 

972 

2021 

4141 

8416 

16712 

33122 

66154 

131715 


151 

268 

534 

972 

2009 

4135 

8415 

16718 

33128 

66167 

131733 


108 

259 

502 

1026 

2065 

4153 

8170 

16381 

32703 

65439 

131104 


140 

271 

480 

984 

1932 

3945 

8043 

16196 

32363 

64893 

130817 


Notice  that  ranlv3  does  very  well  on  this  test  We  also  ran  20  independent  tests  with  different  seeds  for  the  case 
when  n  =  131,072  and  it  =  16,384.  We  found  that  rani  failed  the  test  three  times  (or  15%  of  the  time),  ranlv2 
failed  the  test  two  times  (or  10%  of  the  time),  and  ranlv3  never  failed. 


A.2.2  Sequential  Correlation  Tests 

Let  Ut  ~  17(0, 1)  be  the  ith  random  number  from  a  random  number  generator.  Now,  if  the  Ufs  are  really  independent 
and  identically  distributed  (HD)  17(0, 1)  random  variates,  then  the  nonoverlapping  d-tuples 

U1  =  (U1,U2,---,Ud),  V2  =  (Ud+l,Ud+2,---,U2d),  •••  (A-4) 

should  be  HD  random  vectors  distributed  uniformly  on  the  ^-dimensional  unit  hypercube,  [0, 1)**.  Let  ni,i2-id  be  the 
number  of  ri’s  having  first  component  in  subinterval  iu  second  component  in  subinterval  i2,  •  •  •,  and  dth  component 
in  subinterval  id.  Then  the  computed  chi-square  is  given  by  the  formula 

,d  *  *  *  /  _  \2 

<a-5> 

n  i,  =  li2=l  ii= iV  K  7 

That  is,  this  quantity  should  have  an  approximate  x2  distribution  with  kd  -  1  degrees  of  freedom.  Table  A-4  shows 
the  test  results  for  sequential  pairs  of  random  numbers. 


Table  A-4.  Two-Dimensional  Chi-Square  Test  Results  (at  a  0.05  Level  of  Significance) 


Number  of 
Samples ,  n 

Number  of 
Bins,  k 

Critical 

Value 

rand 

drand48 

Computed  Value  of  x2 
ranO  rani 

ranlv2 

ranlv3 

2048 

162 

293 

263 

273 

256 

235 

276 

267 

8192 

322 

1099 

1031 

1065 

1066 

1012 

981 

1053 

32768 

642 

4245 

4053 

4164 

4096 

3956 

4015 

4163 

131072 

1282 

16682 

16138 

16412 

16690 

16380 

16303 

16283 

524288 

2562 

66132 

65442 

66009 

65526 

65788 

65507 

65168 

2097152 

5122 

263335 

260149 

263072 

261968 

262948 

262913 

261518 

With  one  exception,  ranO  for  131072  samples,  they  all  pass  this  test. 
Table  A-5  shows  the  test  results  for  sequential  triplets  of  random  numbers. 
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Table  A-5.  Three-Dimensional  Chi-Square  Test  Results  (at  a  0.05  Level  of  Significance) 


Number  of 
Samples ,  n 

Number  of 
Bins ,  k 

Critical 

Value 

rand 

drand48 

Computed  Value  of  x2 
ranO  rani 

ranlv2 

ranlv3 

■ 

43 

83 

54 

65 

52 

69 

63 

63 

83 

565 

479 

522 

495 

519 

491 

536 

32768 

163 

4245 

4210 

4096 

4224 

4131 

4071 

4222 

262144 

323 

33189 

32872 

32486 

32558 

32626 

32654 

32675 

2097152 

643 

263335 

262365 

261818 

261986 

261716 

262854 

262002 

All  six  generators  pass  this  test. 


A.2.3  Runs-Up  Test 

This  is  a  test  for  independence.  Each  sequence  of  random  numbers  is  examined  for  unbroken  subsequences  of  maxi¬ 
mal  length  within  which  the  numbers  increase  monotonically;  such  a  subsequence  is  called  a  run-up.  Define 

(number  of  runs  of  length  i  for  i  =  1 , 2,  •  •  • ,  5 
number  of  runs  of  length  >6  for  i  =  6 

and  let  n  be  the  total  length  of  the  sequence.  Then  the  test  statistic  is  given  by  the  formula 

6  6 


s  =  (»/  -  «*<)  («;•  -  nbj). 


=  U- 1 


where  Atj  is  the  ijth  element  of  the  matrix  (Knuth  1969) 


A  - 


and  the  Vs  are  given  by 


'4529.4 

9044.9 

13568. 

18091. 

22615. 

27892.  ' 

9044.9 

18097. 

27139. 

36187. 

45234. 

55789. 

13568. 

27139. 

40721. 

54281. 

67852. 

83685. 

18091. 

36187. 

54281. 

72414. 

90470. 

111580. 

22615. 

45234. 

67852. 

90470. 

113262. 

139476. 

.  27892. 

55789. 

83685. 

111580. 

139476. 

172860.  _ 

0>i,b2, 

5  11  19  29  1  \ 

i  6  ’  24  ’  120  ’  720  ’  5040  ’  840  j 

(A-6) 


(A-7) 


(A-8) 


(A-9) 


For  large  n  (n  >  4000),  R  will  have  an  approximate  chi-square  distribution  with  6  degrees  of  freedom,  so  that 
£6,0.95  =  12. 6  for  an  a  =  0. 05  significance  level.  Table  A-6  shows  the  results  from  this  test 


Table  A-6.  Runs-Up  Test  Results  (at  a  0.05  Level  of  Significance) 


Number  of 
Samples,  n 

£2 

Critical  Value 

rand 

drand48 

Computed  Value  ofR 
ranO  rani  ranlv2 

ranlv3 

104 

12.6 

1.69 

5.69 

4.58 

2.93 

2.97 

7.36 

10s 

12.6 

4.37 

3.78 

0.92 

6.27 

5.54 

3.31 

106 

12.6 

9.07 

5.67 

6.59 

7.27 

8.93 

11.8 

All  the  generators  pass  this  test. 

A.2.4  Kolmogorov-Smirnov  (K-S)  Test 

This,  like  the  chi-square  test,  is  a  test  for  uniformity.  But,  unlike  the  chi-square  test,  the  K-S  test  does  not  require  us 
to  bin  the  data.  Instead,  it  is  a  direct  comparison  between  the  empirical  distribution  and  F(x),  the  cumulative  distri¬ 
bution — which  in  this  case,  is  simply  F(x)  =  x.  The  K-S  statistic  is  the  largest  vertical  distance  between  the  theoret¬ 
ical  distribution  and  the  empirical  distribution.  In  practice,  we  compute 
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r*  .1 

/  'l  — *  rvi  Oy 

i- 1" 

In  'T 

un  —  max 

1  ZiZn 

X'l 

n 

and  Dn  =  max  {D+n,  Dn}.  (A-10) 

We  reject  the  null  hypothesis  that  the  generated  numbers  are  uniformly  distributed  over  the  interval  [0, 1)  if 


Vn  +  0. 12  +  >  Ci-a, 

where  the  critical  value  for  a  =  0. 05,  is  1. 358.  Table  A-7  shows  the  test  results. 


(A-ll) 


Table  A-7.  K-S  Test  Results  (at  a  0.05  Level  of  Significance) 


Number  of 
Samples ,  n 

K-S 

Critical  Value 

rand 

Computed  Value  of(4n  +  0. 12  +  0. 1  l/Vn)Z>„ 
drand48  ranO  rani  ranlv2 

ranlv3 

103 

1.358 

0.860 

0.821 

0.794 

0.700 

0.651 

0.543 

104 

1.358 

0.780 

0.693 

0.948 

0.928 

0.394 

0.860 

105 

1.358 

0.870 

0.830 

0.956 

0.950 

1.035 

0.943 

106 

1.358 

0.613 

0.697 

1.021 

1.026 

1.085 

0.650 

We  see  that  all  six  generators  pass  this  test. 

From  these  four  validation  test  results,  we  see  that,  with  the  exception  of  rand,  all  the  generators  are  about  the  same. 
Overall,  ranlv3  seems  to  perform  the  best,  and  so  it  was  chosen  as  the  uniform  random  number  generator  in  the 
Random  class.  Incidently,  it  should  be  mentioned  that  the  Random  class  permits  more  than  one  independent  random 
number  stream.  For  example,  if  we  set 


Random  rvl(  seedl  ),  rv2(  seed2  ); 

then  rvl  and  rv2  are  distinct  objects  so  that  the  stream  generated  by  rvl  alone  is  not  altered  by  the  presence  of 
rv2.  This  can  be  useful  if  we  want  to  vary  a  selected  stochastic  process  while  retaining  all  other  source  of  stochasti- 
cism. 
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Intentionally  left  blank. 
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APPENDIX  B: 

RANDOM  CLASS  SOURCE  CODE 


The  definition  and  implementation  of  the  Random  class  is  consolidated  into  a  single  header  file.  Random. h.  As  a 
consequence,  it  is  only  necessary  to  include  this  header  file  in  any  program  that  makes  use  of  random  number  distri¬ 
butions.  For  example,  here  is  a  simple  program  that  makes  use  of  the  Random  class. 


//  Sample  program  for  using  the  Random  class 
# include  <iostream.h> 

#  include  "Random,  h"  <=  include  the  definition  of  the  Random  class 

void  main(  void  ) 

*  Random  rv;  <=  declare  a  random  variate 


} 


for  (  int  i  -  0;  i  <  1000;  i++  ) 
cout  <<  rv. normal () 

<<  endl; 


<=  reference  the  normal  distribution  ( with  default  parameters) 


If  this  code  is  contained  in  the  file  main .  C,  then  it  can  be  compiled  and  linked  into  an  executable  program,  main, 
with  the  aid  of  the  GNU  C++  compiler  by  invoking  file  following  UNIX  command. 


C++  -o  main  main. C  -lm 

This  program  will  set  the  random  number  seed  from  the  UNIX  process  ED.  If  we  want  to  set  the  seed  explicitly  to 
123456789,  simply  make  the  following  replacement 

Random  rv;  =*  Random  rv(  123456789  ); 

If  at  any  time  later  in  the  program  we  want  to  reset  the  seed,  to  say  773,  we  can  do  that  with  the  following  statement. 

Random  rv. reset (  773  )? 

The  remaining  pages  of  this  appendix  contain  the  complete  source  code  of  the  Random  class. 
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//  Random. h:  Definition  and  Implementation  of  Random  Number  Distribution  Class 


#ifndef  RANDOM_H 
idefine  RANDOM_H 

iinclude  <fstream.h> 

#include  <math.h> 

#include  <limits.h> 
iinclude  <unistd.h> 
iinclude  <assert.h> 
iinclude  <stl.h> 

//  Constants  for  Tausworthe  random  bit  generator 

//  Ref:  Tausworthe,  Robert  C.f  "Random  Numbers  Generated  by  Linear  Recurrence 
//  Modulo  Two,"  Mathematics  of  Computation,  vol.  19,  pp.  201-209,  1965. 

static  const  unsigned  DEGREE_MAX  =32;  //  max  degree  (bits  per  word) 

static  const  unsigned  BIT[  1  +  DEGREE_MAX  ]  =  ( 

//  Hexidecimal  Value  Degree 

// - -  -  - 


0x00000000, 

// 

0 

0 

0x00000001, 

// 

2" 

0 

1 

0x00000002, 

// 

2* 

1 

2 

0x00000004, 

// 

2~ 

2 

3 

0x00000008, 

// 

2~ 

3 

4 

0x00000010, 

// 

2~ 

4 

5 

0x00000020, 

// 

2~ 

5 

6 

0x00000040, 

// 

2~ 

6 

7 

0x00000080, 

// 

2~ 

7 

8 

0x00000100, 

// 

2~ 

8 

9 

0x00000200, 

// 

2A 

9 

10 

0x00000400, 

// 

2~ 

10 

11 

0x00000800, 

// 

2~ 

11 

12 

0x00001000, 

// 

2~ 

12 

13 

0x00002000, 

// 

2~ 

13 

14 

0x00004000, 

// 

2T 

14 

15 

0x00008000, 

// 

2~ 

15 

16 

0x00010000, 

// 

2~ 

16 

17 

0x00020000, 

// 

2~ 

17 

18 

0x00040000, 

// 

2~ 

18 

19 

0x00080000, 

// 

2~ 

19 

20 

0x00100000, 

// 

2~ 

20 

21 

0x00200000, 

// 

2~ 

21 

22 

0x00400000, 

// 

2~ 

22 

23 

0x00800000, 

// 

2A 

23 

24 

0x01000000, 

// 

2~ 

24 

25 

0x02000000, 

// 

2~ 

25 

26 

0x04000000, 

// 

2~ 

26 

27 

0x08000000, 

// 

2~ 

27 

28 

0x10000000, 

// 

2~ 

28 

29 

0x20000000, 

// 

2~ 

29 

30 

0x40000000, 

// 

2~ 

30 

31 

0x80000000 

// 

2~ 

31 

32 

}; 

//  Coefficients  that  define  a  primitive  polynomial  (mod  2) 

//  Ref:  Watson,  E.  J.,  "Primitive  Polynomials  (Mod  2)," 

//  Mathematics  of  Computation,  vol.  16,  pp.  368-369,  1962. 


static  const  unsigned  MASK[  1  +  DEGREE_MAX  ]  =  ( 

BIT(0],  //  0 

BIT [ 0 ] ,  //  1 

BIT [1] ,  //  2 

BIT [1 ] ,  //  3 

BIT [1 ] ,  //  4 

BIT[2],  //  5 

BIT [1] ,  //  6 

BIT [  1  ]  ,  //  7 

BIT [4 ]  +  BIT [ 3 ]  +  BIT [2] ,  //  8 

BIT [4]  ,  //  9 

BIT[3],  //  10 

BIT [2] ,  //  11 

BIT [6]  +  BIT [4 ]  +  BIT[1],  //  12 

BIT [4 ]  +  BIT [3 ]  +  BIT [1]  ,  //  13 

BIT [5]  +  BIT [3]  +  BIT [1]  ,  //  14 

BIT [1] ,  //  15 

BIT [5]  +  BIT [3]  +  BIT [2] ,  //  16 

BIT [3 ] ,  //  17 

BIT  [5  J  +  BIT  [2  3  +  BIT  (13  ,  //  18 

BIT [53  +  BIT [2 ]  +  BIT[1],  //  19 

BIT [3 3,  //  20 

BIT[2],  //  21 

BIT [1]  ,  //  22 

BIT [ 5 ]  ,  //  23 

BIT [4 ]  +  BIT [ 3 J  +  BIT[1] ,  //  24 

BIT [33,  //  25 

BIT [6 ]  +  BIT [23  +  BIT [1] ,  //  26 

BIT[5]  +  BIT [2]  +  BIT [ 1 ] ,  //  27 

BIT [3 ],  //  28 

BIT [2 ] ,  //  29 
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BIT {6]  +  BIT [4]  +  BIT[1],  //  30 
BITJ3],  //  31 
BIT[7]  +  BIT[5]  +  BIT[3]  +  BIT[2]  +  BIT[1]  //  32 


}; 

//  for  convenience ,  define  some  data  structures  for  bivariate  distributions 
struct  cartesianCoord  [  //  cartesian  coordinates  in  2-D 

double  xf  y? 

cartes ianCoords  operator+={  const  cartesianCoords  p  )  { 
x  +=  p.x; 

y  +=  p.y? 

return  *this; 

cartesianCoords  operator-={  const  cartesianCoords  p  )  { 
x  -=  p.x; 

y  -=  p.y; 

return  *this; 

cartesianCoords  operator*:5  (  double  scalar  )  { 
x  *-  scalar; 
y  *-  scalar; 
return  *this; 

cartesianCoord&  operator/=(  double  scalar  )  { 
x  /=  scalar; 
y  /=  scalar; 
return  *this; 

} 


struct  sphericalCoord  (  //  spherical  coordinates  on  unit  sphere 

double  theta,  phi; 

double  x (  void  )  {  return  sin<  theta  )  *  cos(  phi  );  }  //  x- coordinate 

double  y(  void  )  {  return  sin<  theta  )  *  sin<  phi  );  }  //  y-coordinate 

double  z(  void  )  {  return  cos(  theta  );  }  //  z -coordinate 

}? 

class  Random  { 

//  friends  list 

//  overloaded  relational  operators 

friend  bool  operator==(  const  Bandom&  p,  const  Randoms  q  ) 

bool  equal  =  (  p . _seed  ==  q._seed  )  S&  (  p._next  *»  q._next  ); 
for  (  int  i  =  0;  i  <  p._NTAB;  i++  ) 

equal  =  equal  &&  (  p._table[  i  ]  ==  q._table[  i  ]  ); 
return  equal; 

} 

friend  bool  operator !=(  const  Randoms  p,  const  Randoms  q  ) 

C 

return  ! (  p  ==  q  ) ; 

} 

//  overloaded  stream  operator 

friend  istreams  operator >>(  istreams  is.  Randoms  rv  ) 

cout  <<  "Enter  a  random  number  seed  " 

«  "(between  1  and  "  «  L0NG_MAX  -  1  <<  ",  inclusive):  "  «  endl; 

is  >>  rv._seed; 

assert(  rv._seed  !=  0  ss  rv._seed  !=  LONG_MAX  ); 
rv . _seedTable ( ) ; 
return  is; 

} 

public: 

Random (  long  seed  )  //  constructor  to  set  the  seed 

{ 

assert{  seed  !=  0  SS  seed  !=  LONG_MAX  ); 

_seed  =  seed; 

_seedTable ( ) ; 

_seed2  =  _seed  |  1;  //  for  tausworthe  random  bit  generation 

} 

Random  (  void  )  //  default  constructor  uses  process  id  to  set  the  seed 

t 

_seed  =  long(  getpid( )  ); 

_seedTable ( ) ; 

_seed2  =  _seed  |  1;  //  for  tausworthe  random  bit  generation 

} 

~Random(  void  )  //  default  destructor 
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{ 

} 


Random (  const  Randoms  r  )  //  copy  constructor  (copies  current  state) 

_seed  -  r ._seed; 

_seed2  =  r._seed2; 

//  copy  the  current  state  of  the  shuffle  table 
_next  =  r._next; 

for  (  int  i  =  0;  i  <  _NTAB;  i++  )  stable [  i  ]  =  r._table[  i  ]; 


Randoms  operator=(  const  Randoms  r  )  //  overloaded  assignment 

if  (  *this  ==  r  )  return  *this; 

_seed  =  r._seed; 

_seed2  =  r._seed2; 

//  copy  the  current  state  of  the  shuffle  table 
_next  =  r._next; 

for  (  int  i  =  0;  i  <  _NTAB;  i++  )  _table[  i  ]  =  r._table[  i  ]; 
return  *this; 

} 

//  utility  functions 

void  reset (  long  seed  )  //  reset  the  seed  explicitly 

assert (  seed  !=  0  ss  seed  1=  L0NG_MAX  ); 

_seed  =  seed; 

_seedTable{ ) ; 

_seed2  =  _seed  j  1;  //so  that  all  bits  cannot  be  zero 


void  reset (  void  )  //  reset  seed  from  current  process  id 

_seed  =  long(  getpid()  ); 

_seedTable  ( ) ; 

_seed2  =  _seed  J  1;  //so  that  all  bits  cannot  be  zero 
//  Continuous  Distributions 

double  arcsine  <  double  xMin  =  0 . ,  double  xMax  =  1.  )  //  Arc  Sine 

double  q  =  sin<  M_PI_2  *  _u()  ); 
return  xMin  +  (  xMax  -  xMin  )  *  q  *  q; 


double  beta(  double  v,  double  w,  //  Beta 

^  double  xMin  =  0.,  double  xMax  =1.  )  //  (v  >  0.  and  w  >  0.) 

if  (  v  <  w  )  return  xMax  -  (  xMax  -  xMin  )  *  beta(  w,  v  ); 
double  yl  =  gamma (  0 . ,  1 . ,  v  )  ; 
double  y2  =  gamma{  0.,  1.,  w  ); 

return  xMin  +  (  xMax  -  xMin  )  *  yl  /  (  yl  +  y2  ) ; 


double  cauchy (  double  a  =  0.r  double  b  “  1.  )  //  Cauchy  (or  Lorentz) 

//  a  is  the  location  parameter  and  b  is  the  scale  parameter 

//  b  is  the  half  width  at  half  maximum  (HWHM )  and  variance  doesn't  exist 

assert(  b  >  0.  ); 

return  a  +  b  *  tan(  M_PI  *  uniform(  -0.5,  0.5  )  ); 


double  chi Square (  int  df  )  //  Chi-Square 

assert(  df  >=  1  ); 

return  gamma (  0.,  2.,  0.5  *  double (  df  )  ) ; 

double  cosine (  double  xMin  =  0.,  double  xMax  =  1.  )  //  Cosine 

assert (  xMin  <  xMax  ); 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ;  //  location  parameter 

double  b  =  (  xMax  -  xMin  )  /  M_PI;  //  scale  parameter 

return  a  +  b  *  asin(  uniform(  -1.,  1.  )  ); 

} 

double  doubleLog(  double  xMin  =  -1.,  double  xMax  =  1.  )  //  Double  Log 

assert{  xMin  <  xMax  ); 


94 


double  a  =  0-5  *  (  xMin  +  xMax  ); 
double  b  =  0  -  5  *  (  zMaz  -  xMin  ) ; 


//  location  parameter 
//  scale  parameter 


} 


if  (  bernoulli(  0.5  )  ) 
else 


return  a  + 
return  a  - 


b  *  _u()  *  _u<); 
b  *  _u()  *  _u(); 


double  erlang ( 

{ 

assert(  b  > 

double  prod 
for  (  int  i 

return  -b  * 


double  bf  int  c 
0.  &&  c  >=  1  ); 
=  1-; 

=  0;  i  <  c;  i++ 
log(  prod  ); 


} 

double  exponential {  double  a  “ 

( 

assert<  c  >  0.0  ); 


)  //  Erlang  (b  >  0. 


)  prod  *=  _u(); 


0 . ,  double  c  =  1 .  ) 


and  c  >=  1) 


//  Exponential 
//  location  a,  shape  c 


return  a  -  c  *  log(  _u()  ); 

} 

double  extremeValue {  double  a  =  0 . ,  double  c  =  1 . 

{ 

assert(  c  >  0.  ); 


//  Extreme  Value 
//  location  a,  shape  c 


return  a  +  c  *  log(  -log(  _u()  )  ); 

} 

double  f Ratio (  int  v,  int  w  )  //  F  Ratio  (v  and  w  >=  1) 

{ 

assert(  v  >=  1  &6  w  >=  1  ); 

return  (  chiSquare(  v  )  /  v  )  /  (  chiSquare<  w  )  /  w  ); 

} 

double  gamma (  double  af  double  b,  double  c  )  //  Gamma 

(  //  location  a,  scale  b, 

assert{  b  >  0.  &&  c  >  0.  )? 


shape  c 


const  double  A  =  X .  /  sqrt (  2 .  *  c  -  1 .  ) ; 

const  double  B  »  c  -  log (  4 .  > ; 

const  double  Q  =  c  +  1.  /A; 

const  double  T  =  4.5; 

const  double  D  =  1 .  +  log (  T  ) ; 

const  double  C  =  1.  +  c  /  M_E; 


if  (  c  <  1.  )  { 

while  (  true  >  [ 

double  p  =  C  *  _u( ) ; 
if  <  P  >  1.  )  { 

double  y  =  ~log(  (  C  -  p  )  /  c  ); 

if  (  _u<)  <=  pow<  y,  c  -  1.  )  )  return  a  +  b  *  y; 

} 

else  C 

double  y  *  pow(  pr  1.  /  c  ); 

if  <  _u<)  <=  exp(  -y  )  )  return  a  +  b  *  y; 

} 

} 

else  if  (  c  ==  1.0  )  return  exponential (  a,  b  ); 
else  [ 

while  (  true  )  C 

double  pi  =  _u( ) ; 
double  p2  =  _tt( ) ; 

double  v  =  A  *  log(  pi  /  (  1.  -  pi  )  )? 
double  y  =  c  *  exp(  v  ); 
double  z  =  pi  *  pi  *  p2; 
double  w~B+Q*v-y; 

if  (w+D-T*z>=0.  | |  w  >—  log(  z  )  )  return  a  +  b  *  y; 

} 

} 

} 


double  laplace(  double  a  =  0.,  double  b  -  1. 

{ 

assert (  b  >  0.  ); 


//  Laplace 

//  (or  double  exponential) 


//  composition  method 

if  (  bernoulli (  0.5  )  )  return  a  +  b  *  log<  _u( )  ); 
else  return  a  -  b  *  log(  _u< )  ); 


double  logarithmic (  double  xMin  =  0.f  double  xMax  =  1.  )  //  Logarithmic 

{ 

assert (  xMin  <  xMax  ); 


double  a  =  xMin; 
double  b  =  xMax  -  xMin; 


//  location  parameter 
//  scale  parameter 
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//  use  convolution  formula  for  product  of  two  I ID  uniform  variates 
return  a  +  b  *  _u()  *  _u(); 

} 

double  logistic (  double  a  =  0 . ,  double  c  =  1 .  )  //  Logistic 

assert(  c  >  0.  ); 

return  a  -  c  *  log{  1.  /  _u( )  -  1.  ); 

} 

double  lognormal (  double  a,  double  mu,  double  sigma  )  //  Lognormal 

C 

return  a  +  exp(  normal (  mu,  sigma  )  ); 

} 

double  normal (  double  mu  =  0.,  double  sigma  =  1.  )  //  Normal 

{ 

assert(  sigma  >0.  ); 

static  bool  f  =  true; 
static  double  p,  pi,  p2; 
double  q; 

if  (  f  )  C 
do  [ 

pi  =  uniform(  -1.,  1.  ); 
p2  =  uniform (  -1.,  1.  ) ; 
p  =  pi  *  pi  +  p2  *  p2; 

}  while  (  p  >=  I.  ) ; 
q  =  pi; 

} 

else 

q  ~  p2 ; 
f  =  !f; 

return  mu  +  sigma  *  q  *  sqrt(  -2.  *  log<  p  )  /  p  ); 

} 

double  parabolic  (  double  xMin  =  0 . ,  double  xMax  =  1 .  )  //  Parabolic 

l 

assert (  xMin  <  xMax  ); 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ;  //  location  parameter 

double  yMax  =  _parabola(  a,  xMin,  xMax  );  //  maximum  function  range 

return  userSpecified(  .parabola,  xMin,  xMax,  0.,  yMax  ); 

} 

double  pareto (  double  c  )  //  Pareto 

{  //  shape  c 

assert(  c  >  0.  ); 

return  pow(  _u(),  -1.  /  c  ); 

} 

double  pearson5(  double  b,  double  c  )  //  Pearson  Type  5 

[  //  scale  b,  shape  c 

assert(  b  >  0.  &&  c  >  0.  ); 

return  1.  /  gamma (  0.,  1.  /  b,  c  ); 

} 

double  pearson6(  double  b,  double  v,  double  w  )  //  Pearson  Type  6 

{  //  scale  b,  shape  v  &  w 

assert<  b>0.  &&  v  >  0.  &&  w  >  0.  ); 

return  gamma (  0.,  b,  v  )  /  gamma (  0.,  b,  w  ); 

} 

double  power (  double  c  )  //  Power 

{  //  shape  c 

assert(  c  >  0.  ); 

return  pow(  _u(),  1.  /  c  ); 

} 

double  rayleigh(  double  a,  double  b  )  //  Rayleigh 

C  //  location  a,  scale  b 

assert(  b  >  0.  ); 

return  a  +  b  *  sqrt(  -log(  _u< )  )  ); 

} 

double  studentT(  int  df  )  //  Student's  T 

[  //  degres  of  freedom  df 

assert(  df  >=  1  ) ; 

return  normal()  /  sqrt(  chiSquare{  df  )  /  df  ) ; 

} 

double  triangular (  double  xMin  =0.,  //  Triangular 
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doable  xMax  =  1. ,  //  with  default  interval  [0,1) 

double  c  =0.5)  //  and  default  mode  0.5 

{ 

assert <  xMin  <  xMax  &&  xMin  <=  c  &&  c  <=  xMax  ); 

double  p  =  _u(),  q  =  1.  -  p; 

if  (  p  <=  (  c  -  xMin  )  /  (  xMax  -  xMin  )  ) 

return  xMin  +  sqrt(  (  xMax  -  xMin  )  *  (  c  -  xMin  )  *  p  ); 

else 

return  xMax  -  sqrt(  (  xMax  -  xMin  )  *  (  xMax  -  c  )  *  q  )? 

} 

double  uniform (  double  xMin  =0.,  double  xMax  =  1.  )  //  Uniform 

(  //  on  [xMin, xMax) 

assert (  xMin  <  xMax  ); 

return  xMin  +  (  xMax  -  xMin  )  *  _u< ) ; 


double  userSpecified( 
doublet  *usf  ){ 
double, 
double, 
double  ) , 

double  xMin,  double  xMax, 
double  yMin,  double  yMax  ) 


//  User -Specified  Distribution 
//  pointer  to  user -specified  function 
//  x 
//  xMin 
//  xMax 

//  function  domain 
//  function  range 


assert (  xMin  <  xMax  &&  yMin  <  yMax  ); 

double  x,  y,  areaMax  =  (  xMax  -  xMin  )  *  (  yMax  -  yMin  ); 
//  acceptance- rejection  method 
do  [ 

x  =  uni form (  0.0,  areaMax  )  /  (  yMax  -  yMin  )  +  xMin? 
y  =  uni form (  yMin,  yMax  ); 

]  while  (  y  >  usf(  x,  xMin,  xMax  )  ); 

return  x? 


double  weibull(  double  a,  double  b,  double  c  )  //  Weibull 


assert(  b  >  0.  £&  c  >  0.  )? 

return  a  +  b  *  pow{  -log(  _u< )  ),  1.  /  c  ); 


//  location  a,  scale  b, 
//  shape  c 


//  Discrete  Distributions 

bool  bernoulli (  double  p  =  0.5  )  //  Bernoulli  Trial 

C 

assert (  0.  <=  p  &&  p  <=  1.  ) ; 
return  _u{ )  <  p; 

} 

int  binomial (  int  n,  double  p  )  //  Binomial 

C 

assert(  n  >=  1  &&  0.  <=  p  &&  p  <=  1.  ); 
int  sum  =0; 

for  (  int  i  =  0;  i  <  n;  i++  )  sum  +=  bernoulli (  p  ); 
return  sum; 

} 

int  geometric  (  double  p  )  //  Geometric 

t 

assert (  0.  <  p  &&  p  <  1.  ); 

return  int(  log<  _u< )  )  /  log(  1.  -  p  )  ); 

) 

int  hypergeometric  (  int  n,  int  N,  int  K  )  //  Hypergeometric 

{  //  trials  n,  size  N, 

assert{  0  <=  n  &&  n  <=  N  &&  N  >=  1  &&  K  >=  0  ) ;  //  successes  K 

int  count  =  0; 

for  (  int  i  =  0;  i  <  n;  i++,  N--  )  { 

double  p  =  double (  K  )  /  double {  N  ); 
if  (  bernoulli (  p  )  )  {  count++;  K--;  } 

} 

return  count; 


void  multinomial (  int  n, 

double  p[] , 
int  count [ J , 

int  m  ) 

{ 


//  Multinomial 

//  trials  n,  probability  vector  p, 
//  success  vector  count, 

//  number  of  disjoint  events  m 
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assert (  m  >=  2  ) ;  //at  least  2  events 
double  sum  =  0 . ; 

for  (  int  bin  =  0;  bin  <  m;  bin++  )  sum  +=  p[  bin  ]; 
assert (  sum  ==  1.  ); 


//  probabilities 
//  must  sum  to  1 


for  (  int  bin  =  0;  bin  <  m;  bin++  )  count [  bin  ]  =  0;  //  initialize 

//  generate  n  uniform  variates  in  the  interval  [0,1)  and  bin  the  results 
for  (  int  i  =  0;  i  <  n;  i++  )  ( 

double  lower  =  0.,  upper  =  0.,  u  =  „u(); 
for  (  int  bin  =  0;  bin  <  m;  bin++  )  ( 

//  locate  subinterval,  which  is  of  length  p[  bin  ], 

//  that  contains  the  variate  and  increment  the  corresponding  counter 

lower  =  upper; 
upper  +=  p [  bin  ] ; 

if  (  lower  <=  u  &&  u  <  upper  )  [  count [  bin  ]++;  break;  ) 


} 

int  negativeBinomial (  int  s,  double  p  )  //  Negative  Binomial 

C  //  successes  s,  probability  p 

assert  (  s  >=  1  &&  0.  <  p  &&  p  <  1.  ); 

int  sum  *  0; 

for  (  int  i  =  0;  i  <  s;  i++  )  sum  +=  geometric (  p  ); 
return  sum;. 


) 

int  pascal (  int  s,  double  p  ) 

[ 

return  negativeBinomial (  s,  p  )  +  s; 

} 


//  Pascal 

//  successes  s,  probability  p 


int  poisson(  double  mu  ) 

{ 

assert  (  mu  >  0 .  ) ; 

double  a  =  exp(  -mu  ); 
double  b  =  1 . ; 


//  Poisson 


} 


int  i; 

for  (  i  =  0;  b  >=  a;  i++  )  b 
return  i  -  1; 


_u<); 


int  unif ormDiscrete (  int  i,  int  j  )  //  Uniform  Discrete 

C  //  inclusive  i  to  j 

assert(  i  <  j  ); 


} 


return  i  +  int(  (  j  -  i  +  1  )  *  _u()  ); 


//  Empirical  and  Data-Driven  Distributions 

double  empirical (  void  )  //  Empirical  Continuous 

static  vector <  double  >  x,  cdf; 

static  int  n; 

static  bool  init  =  false; 

if  (  ! init  )  [ 

if stream  in(  "empiricalDistribution"  ); 

if  (  !in  )  { 

cerr  <<  "Cannot  open 
exit(  1  ) ; 

} 

double  value,  prob; 

while  (  in  >>  value  >>  prob  )  [  //  read  in  empirical  distribution 

x . push_back {  value  ) ; 
cdf .push_back(  prob  ); 

} 

n  =  x. size( ) ; 

init  =  true; 

//  check  that  this  is  indeed  a  cumulative  distribution 


) 


assert (  0.  ==  cdf[  0  ]  &&  cdf[  n  -  1  ]  ==  1.  ); 

for  (  int  i  =  1;  i  <  n;  i++  )  assert<  cdf[  i  -  1  ]  <  cdf [  i  J  ); 


double  p  =  _u ( ) ; 

for  (  int  i  =  0;  i  <  n  -  1;  i++  ) 

if  (  cdf [  i  j  <=  p  &&  p  <  cdf [  i  +  1  ]  ) 

return  x[  i  ]  +  <  x[  i  +  1  ]  -x[i]  )  * 

(  p  -  cdf [  i  ]  )  /  (  cdf [  i  +  1  ] 

return  x [  n  -  1  ] ; 


°df [  i  ]  ) ; 
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int  empiricalDiscrete (  void  )  //  Empirical  Discrete 

f 

static  vector<  int  >  k; 

static  vector <  double  >  f[  2  ];  //  f[  0  )  is  pdf  and  f[  1  ]  is  cdf 

static  double  mar ; 

static  int  n; 

static  bool  init  *  false; 

if  (  Unit  )  { 

if stream  in  (  "empiricalDiscrete"  )? 
if  (  I in  )  ( 

cerr  <<  "Cannot  open 
exit(  1  ); 

} 

int  value; 
double  freq; 

while  (  in  >>  value  >>  freq  )  [  //  read  in  empirical  data 

k.push_back<  value  ); 
f[  0  ] .push_back(  freq  )? 

} 

n  *  k.size< ); 
init  =  true; 

//  form  the  cumulative  distribution 

f(  1  ].push_back(  f[  0  ][  0  ]  ); 

for  (  int  i  =  1;  i  <  n;  i++  > 

f[  1  ] .push_back<  f[l)[i-l]+f[0][i]  ); 

//  check  that  the  integer  points  are  in  ascending  order 

for  (  int  i  =  1?  i  <  n;  i++  )  assert(  k[  i  -  1  ]  <  k[  i  ]  ); 

max  =  f£  1  ][  n  -  1  ]; 

} 

//  select  a  uniform  variate  between  0  and  the  max  value  of  the  cdf 
double  p  =  uniform (  0.,  max  ); 

//  locate  and  return  the  corresponding  index 

for  {  int  i  =  0;  i  <  n;  i++  )  if  (  p  <-  f [  1  ] I  i  ]  )  return  k[  i  ]; 
return  k [  n  -  1  ] ; 


double  sample (  bool  replace  =  true  ) 

[ 

static  vector <  double  >  v; 
static  bool  init  =  false; 
static  int  n; 
static  int  index  =  0; 

if  (  I init  )  { 

ifstream  in(  "sampleData"  ); 
if  (  !in  )  [ 

cerr  <<  "Cannot  open 
exit(  1  ); 

} 

double  d; 
while  (  in  >>  d  )  v.push_back(  d  ); 
in.close{ ) ; 
n  =  v. size( ) ; 
init  s  true; 

if  (  replace  ==  false  )  (  //  sample  without  replacement 

//  shuffle  contents  of  v  once  and  for  all 

//  Kef:  Knuth,  D.  E.,  The  Art  of  Computer  Programming,  Vol.  2: 
//  Seminumerical  Algorithms.  London:  Addison -Wes ley,  1969. 

for  (  int  i  =  n  -  1;  i  >  0;  i —  )  { 
int  j  =  int(  (  i  +  1  )  *  _u( )  ); 
swap(  v[  i  J,  v[  j  ]  ); 

} 

} 

} 

//  return  a  random  sample 
if  (  replace  ) 

return  v{  uniformDiscrete(  0,  n 
else  { 

assert(  index  <  n  ); 
return  v[  index ++  ]; 

} 


void  sample (  double  x[],  int  ndim  )  //  Sample  from  a  given  distribution 

{  //  of  multi -dimensional  data 

static  const  int  N_DIM  =  6; 
assert (  ndim  <=  N_DIM  ); 

static  vector<  double  >  v [  N_DIM  J ; 


//  sample  w/  replacement 

//  sample  w/o  replacement 
//  retrieve  elements 
//  in  sequential  order 


//  Sample  w  or  w/o  replacement  from  a 
//  distribution  of  1-D  data  in  a  file 
//  vector  for  sampling  with  replacement 
//  flag  that  file  has  been  read  in 
//  number  of  data  elements  in  the  file 
//  subscript  in  the  sequential  order 
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static  bool  init  =  false; 
static  int  n; 

if  (  linit  )  { 

if stream  in(  "sampleData"  ); 
if  (  !in  )  [ 

cerr  <<  "Cannot  open 
exit (  1  ) ; 

} 

double  d; 

while  (  lin.eof()  )  [ 

for  {  int  i  =  0;  i  <  ndim;  i++  )  ( 
in  >>  d; 

v[  i  ].push_back(  d  ); 

} 

} 

in. close ( ) ; 
n  =  v[  0  ] .size( ) ; 
init  =  true; 

} 

int  index  =  uniformDiscrete(  0,  n  -  1  ); 

for  (  int  i  =  0;  i  <  ndim;  i++  )  x[  i  ]  =  v[  i  ] [  index  ]? 


//  comparison  functor  for  determining  the  neighborhood  of  a  data  point 
struct  dSquared  s 

public  binary_f unction <  cartesianCoord,  cartesianCoord,  bool  >  [ 
bool  operator()(  cartesianCoord  p,  cartes ianCoord  q  )  { 
return  p.x  *  p.x  +  p.y  *  p.y  <  q.x  *  q.x  +  q.y  *  q.y; 

}? 


cartesianCoord  stochasticInterpolation(  void  )  //  Stochastic  Interpolation 

//  Refs:  Taylor,  M.  S.  and  J.  R.  Thompson,  Computational  Statistics  &  Data 
//  Analysis,  Vol.  4,  pp.  93-101,  1986;  Thompson,  J.  R.,  Empirical  Model 

//  Building,  pp.  108-114,  Wiley,  1989;  Bodt,  B.  A.  and  M.  S.  Taylor, 

//  A  Data  Based  Random  Number  Generator  for  A  Multivariate  Distribution 

//  -  A  User's  Manual,  ARBRL-TR-02439 ,  BRL,  APG,  MD,  Nov.  1982. 

i 

static  vector <  cartesianCoord  >  data; 

static  cartesianCoord  min,  max; 

static  int  m; 

static  double  lower,  upper; 

static  bool  init  =  false; 

if  (  linit  )  { 

ifstream  in(  "stochasticData"  ) ; 
if  (  ! in  )  { 

cerr  <<  "Cannot  open 
exit (  1  ); 

) 

//  read  in  the  data  and  set  min  and  max  values 


min.x  =  min.y  =  FLT_MAX ; 
max.x  =  max.y  ■  FLT_MIN ; 
cartesianCoord  p; 
while  (  in  >>  p.x  >>  p.y  )  { 


min.x 

min.y 

max.x 

max.y 


< 

p.x 

< 

min.x 

? 

p.x 

min.x 

) 

( 

p*y 

< 

min.y 

? 

p.y 

min.y 

) 

< 

p.x 

> 

max.x 

? 

p.x 

max.x 

) 

( 

p.y 

> 

max.y 

? 

p.y 

max.y 

) 

data . push_back (  p  ) ; 

} 

in. close ( ) ; 
init  =  true; 


//  scale  the  data  so  that  each  dimension  will  have  equal  weight 

for  (  int  i  =  0;  i  <  data.size();  i++  )  { 

data [  i  ] . x  =  (  data [  i  ] . x  -  min . x  )  /  (  max . x  -  min . x  ) ; 

data [  i  ) .y  =  (  data[  i  j.y  -  min.y  )  /  (  max.y  -  min.y  ); 


//  set  m,  the  number  of  points  in  a  neighborhood  of  a  given  point 


m  =  data. size ()  /  20; 
if  (  m  <  5  )  m  =  5; 
if  (  m  >  20  )  m  =  20; 


//  5%  of  all  the  data  points 
//  but  no  less  than  5 
//  and  no  more  than  20 


lower  =  <  1.  -  sqrt( 
upper  =  <  1.  +  sqrt( 


3 .  *  (  double (  m  )  -  1 . 
3 .  *  (  double (  m  )  -  1 . 


)  )  )  /  double (  m  > ; 
)  )  >  /  double (  m  ) ; 


//  uniform  random  selection  of  a  data  point  (with  replacement) 
cartesianCoord  origin  =  data[  unif ormDiscrete (  0,  data. size ()  -  1  )  ]; 
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//  make  this  point  the  origin  of  the  coordinate  system 
for  (  int  n  =  0;  n  <  data.size();  n++  )  data!  n  ]  -=  origin; 

//  sort  the  data  with  respect  to  its  distance  (squared)  from  this  origin 
sort(  data. begin ( ) ,  data.end(),  dSquared( )  ); 

//  find  the  mean  value  of  the  data  in  the  neighborhood  about  this  point 

cartesianCoord  mean; 
mean. i  =  mean.y  =  0.; 

for  (  int  n  =  0;  n  <  m;  n++  )  mean  +=  data[  n  ]; 
mean  /=  double (  m  ) ; 

//  select  a  random  linear  combination  of  the  points  in  this  neighborhood 
cartesianCoord  p; 

p.i  =  p.y  =  0.; 

for  (  int  n  =  0;  n  <  m;  n++  )  ( 
double  rn; 

if  (  m  ==  1  )  rn  =  1 . ; 

else  rn  =  uniform (  lower,  upper  ); 

p.x  +=  rn  *  (  data [  n  ] .x  -  mean.x  ); 
p.y  +=  rn  *  (  data[  n  j.y  -  mean.y  ); 

} 

//  restore  the  data  to  its  original  form 

for  (  int  n  =  0;  n  <  data.size();  n++  )  data[  n  ]  +=  origin; 

//  use  mean  and  original  point  to  translate  the  randomly- chosen  point 

p  +=  mean; 
p  +=  origin; 

//  scale  randomly- chosen  point  to  the  dimensions  of  the  original  data 

p.x  =  p.x  *  (  max.x  -  min.x  )  +  min.x; 

p.y  =  p.y  *  (  max.y  -  min.y  )  +  min.y; 

return  p; 


//  Multivariate  Distributions 

cartesianCoord  bivariateNormal (  double  mux  =0.,  //  Bivariate  Gaussian 

double  sigmaX  =1., 
double  muY  =  0 . , 
double  sigmaY  =1.  ) 

{ 

assert(  sigmaX  >  0.  &&  sigmaY  >0.  ); 

cartesianCoord  p; 
p.x  =  normal (  muX,  sigmaX  ); 
p.y  =  normal (  muY,  sigmaY  ); 
return  p; 


cartesianCoord  bivariateUniform(  double  xMin  =  -1.,  //  Bivariate  Uniform 

double  xMax  =  1 . , 

double  yMin  =  -1 . , 
double  yMax  =  1.  ) 

C 

assert (  xMin  <  xMax  &&  yMxn  <  yMax  ); 

double  xO  =  0.5  *  (  XMin  +  xMax  ); 

double  yO  =  0 . 5  *  (  yMin  +  yMax  ) ; 

double  a  =  0.5  *  (  xMax  -  xMin  ); 

double  b  -  0.5  *  (  yMax  -  yMin  ); 

double  x,  y; 

do  ( 

x  ~  uniform(  -1.,  1.  ); 
y  =  uniform (  -1.,  1.  ); 

}  while (  x*x+y*y>l.  ); 

cartesianCoord  p; 
p.x  =  xO  +  a  *  x; 
p.y  =  yO  +  b  *  y; 
return  p; 


cartesianCoord  corrNormal(  double  r,  //  Correlated  Normal 

double  muX  =  0 . , 
double  sigmaX  =  1 . , 
double  muY  =  0 . , 
double  sigmaY  =1.  ) 

assert (  -1.  <*  r  &&  r  <=  1.  &&  //  bounds  on  correlation  coeff 

sigmaX  >  0.  &&  sigmaY  >  0.  );  //  positive  std  dev 


101 


double  x  =  normal(); 
double  y  »  normal{); 

y  =  r  *  x  +  sqrt(  1.  -  r  *  r  )  *  y;  //  correlate  the  variables 
cartesianCoord  p; 

p.x  =  mux  +  sigmaX  *  x;  //  translate  and  scale 

p.y  =  muY  +  sigmaY  *  y; 
return  p; 


cartesianCoord  corrUniform(  double  r,  //  Correlated  Uniform 

double  xMin  =  0 . , 
double  xMax  =  1 . , 
double  yMin  =  0 . , 
double  yMax  =1.  ) 

£ 

assert{  -1.  <=  r  &&  r  <=  1.  &&  //  bounds  on  correlation  coeff 

xMin  <  xMax  &&  yMin  <  yMax  );  //  bounds  on  domain 

double  xO  =  0.5  *  <  xMin  +  xMax  ); 

double  yO  =  0 . 5  *  (  yMin  +  yMax  ) ; 

double  a  =  0 . 5  *  (  xMax  -  xMin  ) ; 

double  b  =  0 . 5  *  (  yMax  -  yMin  ) ; 

double  x,  y  ,* 

do  [ 

x  =  uniform(  -1.,  1.  ); 
y  =  uniform (  -l.r  1.  ); 

)  while  <  x*x+y*y>l.  ); 

y  =  r  *  x  +  sqrt(  1.  -  r  *  r  )  *  y;  //  correlate  the  variables 
cartesianCoord  p; 

p.x=x0+a*x;  //  translate  and  scale 

p.y  =  yO  +  b  *  y; 
return  p; 


sphericalCoord  spherical (  double  thMin  =  0.,  //  Uniform  Spherical 

double  thMax  =  M_PI, 
double  phMin  =  0 . , 
double  phMax  =  2.  *  M_PI  ) 

C 

assert (  0.  <=  thMin  &&  thMin  <  thMax  &&  thMax  <=  M_PI  && 

0.  <=  phMin  &&  phMin  <  phMax  &&  phMax  <-  2 .  *  M_PI  ); 

sphericalCoord  p;  //  azimuth 

p. theta  =  acos(  uniform(  cos{  thMax  ),  cos(  thMin  >  )  );  //  polar  angle 

p.phi  =  uniform(  phMin,  phMax  );  //  azimuth 

return  p; 


void  sphericalND(  double  x[],  int  n  )  //  Uniform  over  the  surface  of 

//  an  n- dimensional  unit  sphere 
C 

//  generate  a  point  inside  the  unit  n- sphere  by  normal  polar  method 
double  r2  =  0 . ; 

for  (  int  i  =  0;  i  <  n;  i++  )  [ 
x [  i  ]  =  normal { ) ; 
r2+=x[i]*x[i]; 

} 

//  project  the  point  onto  the  surface  of  the  unit  n- sphere  by  scaling 

const  double  A  =  1.  /  sqrt(  r2  ); 

for  (  int  i  =  0;  i  <  n;  i++  )  x[  i  ]  *=  A; 


//  Number  Theoretic  Distributions 

double  avoidance (  void  )  //  Maximal  Avoidance  (1-D) 

{  //  overloaded  for  convenience 

double  x [  1  ] ; 
avoidance (  x,  1  ); 
return  x [  0  ] ; 


void  avoidance (  double  x[],  int  ndim  >  //  Maximal  Avoidance  (N-D) 

{ 

static  const  int  MAXBIT  =  30; 
static  const  int  MAXDIM  =  6; 

assert (  ndim  <=  MAXDIM  ); 

static  unsigned  long  ix[  MAXDIM  +  1  ]  =  [  0  }; 
static  unsigned  long  *u [  MAXBIT  +  1  ] ; 

static  unsigned  long  mdeg[  MAXDIM  +!]={//  degree  of 

0,  //  primitive  polynomial 

1,  2,  3,  3,  4,  4 

}; 
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static  unsigned  long  p[  MAXDIM  +  X  ]  =  (  //  decimal  encoded 

0,  //  interior  bits 

0,  lr  1,  2,  1,  4 

static  unsigned  long  v[  MAXDIM  *  MAXBIT  +  1  ]  *  { 

0 , 

1  ,  If  1/  1  /  If  If 

3 ,  If  3 f  3 ,  If  If 

5 r  7,  7,  3,  3 r  5 r 

15,  11,  5,  15,  13,  9 

}; 

static  double  fac; 
static  int  in  =  -1; 
int  j,  k; 

unsigned  long  i,  m,  pp? 

if  (  in  ==  -1  )  { 
in  »  0; 

fac  =  1.  /  (  1L  «  MAXBIT  ); 

for  (  j  =  1,  k  =  0;  j  <=  MAXBIT;  j++,  k  +=  MAXDIM  )  u[  j  ]  =  &v[  k  ]; 
for  (  k  =  1;  k  <=  MAXDIM;  k++  )  { 

for  (  j  *  1?  j  <=  ®deg[  k  ];  j++  )  u[  j  ]£  k  ]  <<=  (  MAXBIT  -  3  )? 
for  (  j  “  mdegE  k  J  +  1;  j  <*  MAXBIT;  j++  )  { 
pp  =  p[  k  ] ; 

i  =  u[  j  -  mdeg[  k  ]  ]  [  k  ] ; 
i  ~  =  (  i  >>  mdeg[  k  ]  ); 

for  (  int  n  =  mdeg[  k  ]  -  1;  n  >=  1;  n —  )  { 
if  (  pp  S  1  )  i  "=  u[  j  -  n  ]  [  k  J ; 

pp  »=  1; 

} 

u[  j  ]  E  k  ]  =  i? 

} 

} 

} 

m  =  in++; 

for  (  j  =  0;  j  <  MAXBIT;  j++,  ra  >>=  1  )  if  (  ! (  m  &  1  )  )  break; 
if  {  j  >=  MAXBIT  )  exit(  1  ); 

H  =  j  *  MAXDIM; 

for  (  k  =  0;  k  <  ndim;  k++  )  [ 
ix[  k  +  1  ]  ~  =  v[  m  +  k  +  1  ] ; 
x[  k  3  =  ix[  k  +  1  ]  *  fac; 

} 


bool  tausworthe(  unsigned  n  )  //  Tausworthe  random  bit  generator 

{  //  returns  a  single  random  bit 

assert{  1  <=  n  &&  n  <=  32  ); 

if  (  _seed2  &  BITE  n  ]  )  { 

_seed2  =  (  (  _seed2  *  MASKE  n  J  )  <<  1  )  |  BIT[  1  ]; 
return  true; 

} 

else  { 

_seed2  <<=  1; 
return  false; 

} 

} 

void  tausworthe (  bool*  bitvec,  //  Tausworthe  random  bit  generator 
unsigned  n  )  //  returns  a  bit  vector  of  length  n 

//  it  is  guaranteed  to  cycle  through  all  possible  combinations  of  n  bits^ 

//  (except  all  zeros)  before  repeating,  i.e.,  cycle  has  maximal  length  2~n-l. 
//  Kef:  Press,  W.  H.,  B.  P.  Flannery,  S.  A.  Teukolsky  and  W.  T.  Vetterling, 
//  Numerical  Recipes  in  C,  Cambridge  Univ.  Press,  Cambridge,  1988. 

*  assert {  1  <=  n  Sfi  n  <=  32  ) ;  //  length  of  bit  vector 

if  (  _seed2  &  BITE  n  ]  ) 

_seed2  =  (  (  _seed2  *  MASKf  n  ]  )  <<  1  )  |  BITE  1  ]; 
else 

_seed2  «=  1; 

for  (  int  i  =  0;  i  <  n;  i++  )  bitvecf  i  ]  =  _seed2  &  (  BIT[  n  J  >>  i  ); 


private : 


static  const  long  _M  =  0x7fffffff;  //  2147483647  (Mersenne  prime  2~31-1) 

static  const  long  _A  =  OxlOffS;  //  69621 

static  const  long  __Q  =  0x7  8 7d;  //  30845 

static  const  long  _R  =  0x5d5e;  //  23902 

static  const  double  _F  =  1-  /  _M; 

static  const  short  _NTAB  =32;  //  arbitrary  length  of  shuffle  table 


static  const  long  _DIV 
long  _table[  _NTAB 

long  _next; 

long  _seed; 

unsigned  _seed2; 


1+ (_M-1 )/_NTAB ; 

];  //  shuffle  table  of  seeds 

//  seed  to  be  used  as  index  into  table 
//  current  random  number  seed 
//  seed  for  tausworthe  random  bit 


void  _seedTable(  void  ) 

C  for  (  int  i  =  _NTAB  +  7;  i  >=  0;  i--  )  { 


//  seeds  the  shuffle  table 
//  first  perform  8  warm-ups 
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long  k  =  _seed  /  _Q; 

_seed  =  _A  *  (  _seed  -  k  *  _Q 
if  (  _seed  <  0  )  _seed  +=  _M; 


-  k  *  _R; 


//  seed  =  (  A  *  seed  )  %  M 
//  without  overflow  by 
//  Schrage' s  method 


if  (  i  <  _NTAB  )  __table[  i  j  =  _seed; 

} 

_next  =  _table [  0  ] ; 


//  store  seeds  into  table 
//  used  as  index  next  time 


double  _u(  void  ) 

C 

long  k  =  _seed  /  _Q; 

_seed  =  _A  *  (  _seed  -  k  *  _Q  ) 
if  (  _seed  <  0  )  _seed  +=  „M; 

int  index  -  _next  /  _DIV; 

_next  =  _ table [  index  ] ; 

_table[  index  ]  =  _seed; 


//  uniform  mg 

//  seed  ■  (  A* seed  )  %  M 
k  *  //  without  overflow  by 

//  Schrage' s  method 

//  Bays -Durham  shuffle 
//  seed  used  for  next  time 
//  replace  with  new  seed 


return  _next  *  _F; 

} 


//  scale  value  within  [0,1) 


static  double  _parabola(  double  x,  double  xMin,  double  xMax  )  //  parabola 

if  (  x  <  xMin  J |  x  >  xMax  )  return  0.0; 

double  a  =  0 . 5  *  (  xMin  +  xMax  ) ;  //  location  parameter 

double  b  *  0.5  *  (  xMax  -  xMin  );  //  scale  parameter 

double  yMax  -  0.75  /  b; 

return  yMax  *  (  1 .  -(x-a)*(x-a)/(b*b)  ); 

} 

}; 


#endif 


GLOSSARY 


Variate 

A  random  variable  from  a  probability  distribution.  Topically,  capital  letters  X,  Y,  •  •  •  are  used  to  denote  variates. 

U~U(0,1) 

Signifies  a  variate  U  is  drawn  or  sampled  from  the  uniform  distribution. 

IID 

Independent  and  identically  distributed. 

Probability  Density  Function  (PDF) 

f(k )  for  a  discrete  distribution. 

/(x)  for  a  continuous  distribution. 

Cumulative  Distribution  Function  (CDF) 

F(k)  for  a  discrete  distribution. 

F(x)  for  a  continuous  distribution. 

Mode 

The  value  of  k  where  f(k)  is  a  global  maximum  for  a  discrete  distribution. 

The  value  of  x  where  / (x)  is  a  global  maximum  for  a  continuous  distribution. 

Median 

The  point  such  that  half  the  values  are  greater  for  a  discrete  distribution.  If  the  points  are  ordered  from  smallest 
to  largest,  then 

,  f*(n+iy2  if » is  odd 

median  \(k*n  +  kM+ 1)/2  if  n  is  even 
Ffxmedia,)  =  1/2  for  a  continuous  distribution. 

Mean 

k-  X  k  /(*)  f°r  a  discr®te  distribution. 

all  values  of  k 


oo 

xf(x)dx  for  a  continuous  distribution. 

-oo 


Variance 

O2  =  X  (*-  kffik )  for  a  discrete  distribution. 

all  values  of  k 
oo 

a1  =  f  (x  -  x)2f(x)dx  for  a  continuous  distribution. 


Standard  Deviation 

a,  the  square  root  of  the  variance. 

Skewness  (as  defined  by  Pearson) 

Sk  =  (mean — mode)  /  standard  deviation. 


Chebyshev’s  Inequality 

For  any  distribution,  Prob{  lx  -  xl  >  kax  >  < 


— ,  where  k  is  any  positive  real  number. 
k 2 


Leibniz’s  Rule 


,  ow  i  dh(x 1 

—  |  f(u,  x)du=  J  -z^du  +  f(b(x),  x)  — — —  f(a(x),  x) 

a(x)  a(x) 


da(x) 

dx 
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